James Curran (educator)

James Curran (educator)

James R. Curran is an Australian computational linguist. He is the former CEO of Grok Academy and previously a senior lecturer at the University of Sydney. He holds a PhD in Informatics from the University of Edinburgh. == Research == Curran's research focuses on natural language processing (NLP), more specifically combinatory categorial grammar and question answering systems. In addition to his contributions to NLP, Curran has produced a paper on the development of search engines to assist in driving problem based learning. == Works == Curran has co-authored software packages such as C&C tools, a CCG parser (with Stephen Clark). == Educational work == In addition to his work as a University of Sydney lecturer, Curran directed the National Computer Science School, an annual summer school for technologically talented high school students. In 2013, based on their work with NCSS, he, Tara Murphy, Nicky Ringland and Tim Dawborn founded Grok Learning. In 2013 he was one of the authors of the Digital Technologies section of the Australian Curriculum - its first appearance in the national curriculum. Additionally, he acted as an advocate for digital literacy among Australian students. He was the academic director of the Australian Computing Academy, a not-for-profit within the University of Sydney until its merger with Grok Learning in 2021 to form Grok Academy. In 2022, Grok Academy under Curran secured a significant amount of funding from Richard White, founder of WiseTech, with the aim of developing new courses and encouraging other large technology companies to donate likewise. In 2024 Curran cohosted an unreleased children's reality TV show called Future Fixers, which Grok was co-producing. The show was abandoned after other producers learned of pre-existing harassment claims against him. == Sexual harassment allegations == In October 2024, he resigned from his position as CEO and board member of Grok Academy after multiple allegations of harassment were substantiated by an independent investigator. It was reported that over a 10-year span there were nine women, including six who were in high school at the time, that allege Curran sent them inappropriate messages. Additionally, it was revealed that a 2019 University of Sydney investigation found 35 cases of harassment, after which he received a warning and a 2024 University of New South Wales investigation was referred to the NSW police, who took no action as they found no criminal wrongdoing by Curran, in part because the students were over 16 at the time of the alleged harassment. In December 2024, Curran said he was “deeply sorry” for his actions.

Empirical risk minimization

In statistical learning theory, the principle of empirical risk minimization defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the "true risk") because we do not know the true distribution of the data, but we can instead estimate and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk". == Background == The following situation is a general setting of many supervised learning problems. There are two spaces of objects X {\displaystyle X} and Y {\displaystyle Y} and we would like to learn a function h : X → Y {\displaystyle \ h:X\to Y} (often called hypothesis) which outputs an object y ∈ Y {\displaystyle y\in Y} , given x ∈ X {\displaystyle x\in X} . To do so, there is a training set of n {\displaystyle n} examples ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} where x i ∈ X {\displaystyle x_{i}\in X} is an input and y i ∈ Y {\displaystyle y_{i}\in Y} is the corresponding response that is desired from h ( x i ) {\displaystyle h(x_{i})} . To put it more formally, assuming that there is a joint probability distribution P ( x , y ) {\displaystyle P(x,y)} over X {\displaystyle X} and Y {\displaystyle Y} , and that the training set consists of n {\displaystyle n} instances ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} drawn i.i.d. from P ( x , y ) {\displaystyle P(x,y)} . The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because y {\displaystyle y} is not a deterministic function of x {\displaystyle x} , but rather a random variable with conditional distribution P ( y | x ) {\displaystyle P(y|x)} for a fixed x {\displaystyle x} . It is also assumed that there is a non-negative real-valued loss function L ( y ^ , y ) {\displaystyle L({\hat {y}},y)} which measures how different the prediction y ^ {\displaystyle {\hat {y}}} of a hypothesis is from the true outcome y {\displaystyle y} . For classification tasks, these loss functions can be scoring rules. The risk associated with hypothesis h ( x ) {\displaystyle h(x)} is then defined as the expectation of the loss function: R ( h ) = E [ L ( h ( x ) , y ) ] = ∫ L ( h ( x ) , y ) d P ( x , y ) . {\displaystyle R(h)=\mathbf {E} [L(h(x),y)]=\int L(h(x),y)\,dP(x,y).} A loss function commonly used in theory is the 0-1 loss function: L ( y ^ , y ) = { 1 if y ^ ≠ y 0 if y ^ = y {\displaystyle L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}} . The ultimate goal of a learning algorithm is to find a hypothesis h ∗ {\displaystyle h^{}} among a fixed class of functions H {\displaystyle {\mathcal {H}}} for which the risk R ( h ) {\displaystyle R(h)} is minimal: h ∗ = a r g m i n h ∈ H R ( h ) . {\displaystyle h^{}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,{R(h)}.} For classification problems, the Bayes classifier is defined to be the classifier minimizing the risk defined with the 0–1 loss function. == Formal definition == In general, the risk R ( h ) {\displaystyle R(h)} cannot be computed because the distribution P ( x , y ) {\displaystyle P(x,y)} is unknown to the learning algorithm. However, given a sample of iid training data points, we can compute an estimate, called the empirical risk, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the empirical measure: R emp ( h ) = 1 n ∑ i = 1 n L ( h ( x i ) , y i ) . {\displaystyle \!R_{\text{emp}}(h)={\frac {1}{n}}\sum _{i=1}^{n}L(h(x_{i}),y_{i}).} The empirical risk minimization principle states that the learning algorithm should choose a hypothesis h ^ {\displaystyle {\hat {h}}} which minimizes the empirical risk over the hypothesis class H {\displaystyle {\mathcal {H}}} : h ^ = a r g m i n h ∈ H R emp ( h ) . {\displaystyle {\hat {h}}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,R_{\text{emp}}(h).} Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above optimization problem. == Properties == Guarantees for the performance of empirical risk minimization depend strongly on the function class selected as well as the distributional assumptions made. In general, distribution-free methods are too coarse, and do not lead to practical bounds. However, they are still useful in deriving asymptotic properties of learning algorithms, such as consistency. In particular, distribution-free bounds on the performance of empirical risk minimization given a fixed function class can be derived using bounds on the VC complexity of the function class. For simplicity, considering the case of binary classification tasks, it is possible to bound the probability of the selected classifier, ϕ n {\displaystyle \phi _{n}} being much worse than the best possible classifier ϕ ∗ {\displaystyle \phi ^{}} . Consider the risk L {\displaystyle L} defined over the hypothesis class C {\displaystyle {\mathcal {C}}} with growth function S ( C , n ) {\displaystyle {\mathcal {S}}({\mathcal {C}},n)} given a dataset of size n {\displaystyle n} . Then, for every ϵ > 0 {\displaystyle \epsilon >0} : P ( L ( ϕ n ) − L ( ϕ ∗ ) > ϵ ) ≤ 8 S ( C , n ) exp ⁡ { − n ϵ 2 / 32 } {\displaystyle \mathbb {P} \left(L(\phi _{n})-L(\phi ^{})>\epsilon \right)\leq {\mathcal {8}}S({\mathcal {C}},n)\exp\{-n\epsilon ^{2}/32\}} Similar results hold for regression tasks. These results are often based on uniform laws of large numbers, which control the deviation of the empirical risk from the true risk, uniformly over the hypothesis class. === Impossibility results === It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made. This is sometimes referred to as the No free lunch theorem. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can improve on the error for a given sample size for all distributions. Specifically, let ϵ > 0 {\displaystyle \epsilon >0} and consider a sample size n {\displaystyle n} and classification rule ϕ n {\displaystyle \phi _{n}} , there exists a distribution of ( X , Y ) {\displaystyle (X,Y)} with risk L ∗ = 0 {\displaystyle L^{}=0} (meaning that perfect prediction is possible) such that: E L n ≥ 1 / 2 − ϵ . {\displaystyle \mathbb {E} L_{n}\geq 1/2-\epsilon .} It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers a i {\displaystyle a_{i}} converging to zero, it is possible to find a distribution such that: E L n ≥ a i {\displaystyle \mathbb {E} L_{n}\geq a_{i}} for all n {\displaystyle n} . This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution. === Computational complexity === Empirical risk minimization for a classification problem with a 0-1 loss function is known to be an NP-hard problem even for a relatively simple class of functions such as linear classifiers. Nevertheless, it can be solved efficiently when the minimal empirical risk is zero, i.e., data is linearly separable. In practice, machine learning algorithms cope with this issue either by employing a convex approximation to the 0–1 loss function (like hinge loss for SVM), which is easier to optimize, or by imposing assumptions on the distribution P ( x , y ) {\displaystyle P(x,y)} (and thus stop being agnostic learning algorithms to which the above result applies). In the case of convexification, Zhang's lemma majors the excess risk of the original problem using the excess risk of the convexified problem. Minimizing the latter using convex optimization also allow to control the former. == Tilted empirical risk minimization == Tilted empirical risk minimization is a machine learning technique used to modify standard loss functions like squared error, by introducing a tilt parameter. This parameter dynamically adjusts the weight of data points during training, allowing the algorithm to focus on specific regions or characteristics of the data distribution. Tilted empirical risk minimization is particularly useful in scenarios with imbalanced data or when there is a need to emphasize errors in certain parts of the prediction space.

Competition in artificial intelligence

Competition in artificial intelligence refers to the rivalry among companies, research institutions, and governments to develop and deploy the most capable artificial intelligence (AI) systems. The competition spans multiple domains, including large language models (LLMs), autonomous vehicles, robotics, computer vision systems, natural language processing (NLP), and AI-optimized hardware. == Background == Competition in AI is driven by potential economic, strategic, and scientific advantages. Breakthroughs in AI can enhance productivity, enable new products and services, and provide geopolitical leverage. The field has experienced rapid progress since the mid-2010s, particularly in machine learning and artificial neural networks, leading to intense rivalry among leading actors. == Corporate competition == Major technology companies are among the most visible competitors in AI. In the United States, firms such as OpenAI, Google DeepMind, Meta Platforms, Microsoft, Anthropic, and Nvidia compete in building advanced LLMs, generative AI platforms, and AI-optimized graphics processing units (GPUs). In China, companies such as Baidu, Alibaba Group, Tencent, and startups such DeepSeek have become leaders in AI deployment, often with state backing. The "[war for talent]" in AI research has become a defining feature of corporate competition. Leading firms often recruit top AI researchers from rivals, sometimes offering multi-million-dollar compensation packages. == National competition == Governments see leadership in AI as a strategic priority. The United States has funded AI research for military, economic, and societal applications, while China has set a target to lead the world in AI by 2030 through its "New Generation Artificial Intelligence Development Plan". Other nations, including the UK, India, Israel, Russia, South Korea, and members of the European Union, have launched national AI strategies. In February 2026 Anthropic said Chinese companies - DeepSeek, Moonshot AI, and MiniMax - were conducting "distillation attacks" in an attempt to copy their model's capabilities, and warned that business wars were closely tied to geopolitical ones: "foreign labs that illicitly distill American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems." == Sectors of competition == === Large language models and chatbots competition === Competition to produce the most capable generative text models, with benchmarks such as MMLU and ARC used to evaluate performance has been on scale since the emergence of AI. These systems leverage deep learning, especially transformer architectures, to understand and generate human-like language. Companies and research groups globally compete to develop chatbots that are more capable, reliable, and context-aware. Among the most well-known chatbots is ChatGPT, developed by OpenAI. Since its public release in 2022, ChatGPT has rapidly gained widespread attention for its ability to engage in coherent and versatile conversations, assist with creative writing, and solve complex problems. In response, technology firms introduced competing chatbots aiming to challenge or surpass ChatGPT's capabilities. Notably, DeepSeek, a Chinese AI company, launched an advanced chatbot integrated with their R1 language model, emphasizing strong natural language understanding and multilingual support. Similarly, Grok, developed by xAI (company), integrates conversational AI into vehicles and digital assistants, combining natural language processing with real-time data for personalized user interaction. These chatbots not only compete in language tasks but also demonstrate strategic reasoning capabilities by playing complex games such as chess and Go. This form of competition is reminiscent of historic AI milestones set by programs such as Deep Blue and AlphaGo. The OpenAI’s ChatGPT has been tested in playing chess at various levels, while DeepSeek’s chatbot showcased its prowess in online chess tournaments in early 2024, winning several matches against human and AI opponents. Grok, leveraging Tesla's vast data infrastructure, has demonstrated real-time strategic decision-making in simulation environments that include chess-like games. The competition pushes rapid innovation, with firms racing to improve chatbot conversational depth, reduce biases, increase factual accuracy, and integrate multimodal inputs like images and videos. At the same time, the competition raises questions about AI safety, ethical use, and the societal impacts of increasingly human-like chatbots. === Autonomous vehicles === Companies such as Waymo, Tesla, and Baidu are racing to deploy safe and reliable self-driving car technology. === AI chips === Rivalry between Nvidia, AMD, Intel, and Huawei in designing processors optimized for AI workloads. === Military applications === Development of AI-enabled drones, surveillance systems, and decision-support tools, with associated ethical debates. == Events == In 2023, OpenAI released GPT-4, prompting competitors such as Google DeepMind to accelerate the release of their own models, including Gemini. In 2024, Chinese AI company DeepSeek launched the R1 model, leading OpenAI to release an open-source system, GPT-OSS, as a strategic countermeasure. In 2022, Tesla and Waymo both expanded autonomous taxi services in U.S. cities, competing for regulatory approval and public trust. The U.S. Department of Defense's Project Maven and China's AI-enabled surveillance programs have been cited as examples of military AI rivalry. In 2025, Microsoft hired several senior engineers from Google DeepMind, highlighting the ongoing "talent poaching" competition in the AI sector. == Risks and concerns == Critics warn that unrestrained competition in AI can undermine safety, ethics, and governance. Concerns include the proliferation of biased or unsafe models, escalation in autonomous weapons, and reduced cooperation on safety standards.

Artificial general intelligence

Artificial general intelligence (AGI) is a hypothetical type of artificial intelligence that matches or surpasses human capabilities across virtually all cognitive tasks. Beyond AGI, artificial superintelligence (ASI) would outperform the best human abilities across every domain by a wide margin. Unlike artificial narrow intelligence (ANI), whose competence is confined to well‑defined tasks, an AGI system can generalise knowledge, transfer skills between domains, and solve novel problems without task‑specific reprogramming. Creating AGI is a stated goal of technology companies such as OpenAI, Google, xAI, and Meta. A 2020 survey identified 72 active AGI research and development projects across 37 countries. AGI is a common topic in science fiction and futures studies. Contention exists over whether AGI represents an existential risk. Some AI experts and industry figures have stated that mitigating the risk of human extinction posed by AGI should be a global priority. Others find the development of AGI to be in too remote a stage to present such a risk. == Terminology == AGI is also known as strong AI, full AI, human-level AI, human-level intelligent AI, or general intelligent action. The term "artificial general intelligence" was used in 1997 by Mark Gubrud in a discussion of the implications of fully automated military production and operations. A mathematical formalism of AGI named AIXI was proposed in 2000 by Marcus Hutter, who defines intelligence as "an agent’s ability to achieve goals or succeed in a wide range of environments". This type of AGI has also been called "universal artificial intelligence". The term AGI was re-introduced and popularized by Shane Legg and Ben Goertzel around 2002. Some academic sources reserve the term "strong AI" for computer programs that will experience sentience or consciousness. In contrast, weak AI (or narrow AI) can solve a specific problem but lacks general cognitive abilities. Some academic sources use "weak AI" to refer more broadly to any programs that neither experience consciousness nor have a mind in the same sense as humans. Related concepts include artificial superintelligence and transformative AI. An artificial superintelligence (ASI) is a hypothetical type of AGI that is much more generally intelligent than humans, while the notion of transformative AI relates to AI having a large impact on society, for example, similar to the agricultural or industrial revolution. A framework for classifying AGI was proposed in 2023 by Google DeepMind researchers. They define five performance levels of AGI: emerging, competent, expert, virtuoso, and superhuman. For example, a competent AGI is defined as an AI that outperforms 50% of skilled adults in a wide range of non-physical tasks, and a superhuman AGI (i.e., an artificial superintelligence) is similarly defined but with a threshold of 100%. They consider large language models like ChatGPT or LLaMA 2 to be instances of emerging AGI (comparable to unskilled humans). Regarding the autonomy of AGI and associated risks, they define five levels: tool (fully in human control), consultant, collaborator, expert, and agent (fully autonomous). == Characteristics == There is no single agreed-upon definition of intelligence as applied to computers. Computer scientist John McCarthy wrote in 2007: "We cannot yet characterize in general what kinds of computational procedures we want to call intelligent." === Intelligence traits === Researchers generally hold that a system is required to do all of the following to be regarded as an AGI: reason, use strategy, solve puzzles, and make judgments under uncertainty, represent knowledge, including common sense knowledge, plan, learn, communicate in natural language, if necessary, integrate these skills in completion of any given goal. Many interdisciplinary approaches (e.g. cognitive science, computational intelligence, and decision making) consider additional traits such as imagination (the ability to form novel mental images and concepts) and autonomy. Computer-based systems exhibiting these capabilities are now widespread, with modern large language models demonstrating computational creativity, automated reasoning, and decision support simultaneously across domains. === Physical traits === Other capabilities are considered desirable in intelligent systems, as they may affect intelligence or aid in its expression. These include: the ability to sense (e.g. see, hear, etc.), and the ability to act (e.g. move and manipulate objects, change location to explore, etc.) This includes the ability to detect and respond to hazard. === Tests for human-level AGI === Several tests meant to confirm human-level AGI have been considered. ==== Turing test ==== The Turing test was proposed by Alan Turing in his 1950 paper "Computing Machinery and Intelligence". This test involves a human judge engaging in natural language conversations with both a human and a machine designed to generate human-like responses. The machine passes the test if it can convince the judge that it is human a significant fraction of the time. Turing proposed this as a practical measure of machine intelligence, focusing on the ability to produce human-like responses rather than on the internal workings of the machine. The idea of the test is that the machine has to try and pretend to be a man, by answering questions put to it, and it will only pass if the pretence is reasonably convincing. A considerable portion of a jury, who should not be experts about machines, must be taken in by the pretence. In 2014, a chatbot named Eugene Goostman, designed to imitate a 13-year-old Ukrainian boy, reportedly passed a Turing Test event by convincing 33% of judges that it was human. However, this claim was met with significant skepticism from the AI research community, who questioned the test's implementation and its relevance to AGI. A 2025 pre‑registered, three‑party Turing‑test study by Cameron R. Jones and Benjamin K. Bergen showed that GPT-4.5 was judged to be the human in 73% of five‑minute text conversations—surpassing the 67% humanness rate of real confederates and meeting the researchers' criterion for having passed the test. ==== Ikea test ==== The "Ikea test", also known as the Flat Pack Furniture Test, involves an AI controlling a robot which attempts to assemble an Ikea flat-pack furniture product after having been shown the parts and instructions. As early as 2013, MIT's IkeaBot demonstrated fully autonomous multi-robot assembly of an IKEA Lack table in ten minutes, with no human intervention and no pre-programmed assembly instructions. The robots inferred the assembly sequence from the geometry of the parts alone. ==== Coffee test ==== Steve Wozniak proposed a test where a machine is required to enter an average American home and figure out how to make coffee. It must find the coffee machine, find the coffee, add water, find a mug, and brew the coffee by pushing the proper buttons. This test has been substantially approached across multiple systems. In January 2024, Figure AI's Figure 01 humanoid learned to operate a Keurig coffee machine autonomously after watching video demonstrations, using end-to-end neural networks to translate visual input into motor actions. In 2025, researchers at the University of Edinburgh published the ELLMER framework in Nature Machine Intelligence, demonstrating a robotic arm that interprets verbal instructions, analyses its surroundings, and autonomously makes coffee in dynamic kitchen environments — adapting to unforeseen obstacles in real time rather than following pre-programmed sequences. ==== Suleyman's test ==== Mustafa Suleyman's test proposes giving an AI model US$100,000 and asking it to obtain US$1 million. ==== Use of video-games ==== Adams, et al. propose that the ability to learn and succeed in a wide range of video games can be used to test AI intelligence. This range would include games unknown to the AGI developers before the test is administered. === AI-complete problems === A problem is informally called "AI-complete" or "AI-hard" if it is believed that AGI would be needed to solve it, because the solution is beyond the capabilities of a purpose-specific algorithm. == History == === Classical AI === Modern AI research began in the mid-1950s. The first generation of AI researchers were convinced that artificial general intelligence was possible and that it would exist in just a few decades. AI pioneer Herbert A. Simon wrote in 1965: "machines will be capable, within twenty years, of doing any work a man can do". Their predictions were the inspiration for Stanley Kubrick and Arthur C. Clarke's fictional character HAL 9000, who embodied what AI researchers believed they could create by the year 2001. AI pioneer Marvin Minsky was a consultant on the project of making HAL 9000 as realistic as possible according to the consensus predictions of the time. He said in 1967, "Within a generation... the problem of

Neural computation

Neural computation is the information processing performed by networks of neurons. Neural computation is affiliated with the philosophical tradition of computationalism, which advances the thesis that neural computation explains cognition. Warren McCulloch and Walter Pitts were the first to propose an account of neural activity as being computational in their seminal 1943 paper "A Logical Calculus of the Ideas Immanent in Nervous Activity." There are three general branches of computationalism, including classicism, connectionism, and computational neuroscience. All three branches agree that cognition is computation, however, they disagree on what sorts of computations constitute cognition. The classicism tradition believes that computation in the brain is digital, analogous to digital computing. Both connectionism and computational neuroscience do not require that the computations that realize cognition are necessarily digital computations. However, the two branches greatly disagree upon which sorts of experimental data should be used to construct explanatory models of cognitive phenomena. Connectionists rely upon behavioral evidence to construct models to explain cognitive phenomena, whereas computational neuroscience leverages neuroanatomical and neurophysiological information to construct mathematical models that explain cognition. When comparing the three main traditions of the computational theory of mind, as well as the different possible forms of computation in the brain, it is helpful to define what we mean by computation in a general sense. Computation is the processing of information, otherwise known as variables or entities, according to a set of rules. A rule in this sense is simply an instruction for executing a manipulation on the current state of the variable, in order to produce a specified output. In other words, a rule dictates which output to produce given a certain input to the computing system. A computing system is a mechanism whose components must be functionally organized to process the information in accordance with the established set of rules. The types of information processed by a computing system determine which type of computations it performs. Traditionally in cognitive science, there have been two proposed types of computation related to neural activity, digital and analog, with the vast majority of theoretical work incorporating a digital understanding of cognition. Computing systems that perform digital computation are functionally organized to execute operations on strings of digits with respect to the type and location of the digit on the string. It has been argued that neural spike train signaling implements some form of digital computation, since neural spikes may be considered as discrete units or digits, like 0 or 1—the neuron either fires an action potential or it does not. Accordingly, neural spike trains could be seen as strings of digits. Alternatively, analog computing systems perform manipulations on non-discrete, irreducibly continuous variables, that is, entities that vary continuously as a function of time. These sorts of operations are characterized by systems of differential equations. Neural computation can be studied by, for example, building models of neural computation. Work on artificial neural networks has been somewhat inspired by knowledge of neural computation.

Automated storage and retrieval system

An automated storage and retrieval system (ASRS or AS/RS) consists of a variety of computer-controlled systems for automatically placing and retrieving loads from defined storage locations. Automated storage and retrieval systems (AS/RS) are typically used in applications where: There is a very high volume of loads being moved into and out of storage Storage density is important because of space constraints No value is added in this process (no processing, only storage and transport) Accuracy is critical because of potential expensive damages to the load An AS/RS can be used with standard loads as well as nonstandard loads, meaning that each standard load can fit in a uniformly-sized volume; for example, the film canisters in the image of the Defense Visual Information Center are each stored as part of the contents of the uniformly sized metal boxes, which are shown in the image. Standard loads simplify the handling of a request of an item. In addition, audits of the accuracy of the inventory of contents can be restricted to the contents of an individual metal box, rather than undergoing a top-to-bottom search of the entire facility, for a single item. They can also be used in self storage places. == Overview == AS/RS systems are designed for automated storage and retrieval of parts and items in manufacturing, distribution, retail, wholesale and institutions. They first originated in the 1960s, initially focusing on heavy pallet loads but with the evolution of the technology the handled loads have become smaller. The systems operate under computerized control, maintaining an inventory of stored items. Retrieval of items is accomplished by specifying the item type and quantity to be retrieved. The computer determines where in the storage area the item can be retrieved from and schedules the retrieval. It directs the proper automated storage and retrieval machine (SRM) to the location where the item is stored and directs the machine to deposit the item at a location where it is to be picked up. A system of conveyors and or automated guided vehicles is sometimes part of the AS/RS system. These take loads into and out of the storage area and move them to the manufacturing floor or loading docks. To store items, the pallet or tray is placed at an input station for the system, the information for inventory is entered into a computer terminal and the AS/RS system moves the load to the storage area, determines a suitable location for the item, and stores the load. As items are stored into or retrieved from the racks, the computer updates its inventory accordingly. The benefits of an AS/RS system include reduced labor for transporting items into and out of inventory, reduced inventory levels, more accurate tracking of inventory, and space savings. Items are often stored more densely than in systems where items are stored and retrieved manually. Within the storage, items can be placed on trays or hang from bars, which are attached to chains/drives in order to move up and down. The equipment required for an AS/RS include a storage & retrieval machine (SRM) that is used for rapid storage and retrieval of material. SRMs are used to move loads vertically or horizontally, and can also move laterally to place objects in the correct storage location. The trend towards Just In Time production often requires sub-pallet level availability of production inputs, and AS/RS is a much faster way of organizing the storage of smaller items next to production lines. The Material Handling Institute of America (MHIA), the non-profit trade association for the material handling world, and its members have categorised AS/RS into two primary segments: Fixed Aisle and Carousels/Vertical Lift Modules (VLMs). Both sets of technologies provide automated storage and retrieval for parts and items, but use different technologies. Each technology has its unique set of benefits and disadvantages. Fixed Aisle systems are characteristically larger systems whereas carousels and Vertical Lift Modules are used individually or grouped, but in small to medium-sized applications. A fixed-aisle AS/R machine (stacker crane) is one of two main designs: single-masted or double masted. Most are supported on a track and ceiling guided at the top by guide rails or channels to ensure accurate vertical alignment, although some are suspended from the ceiling. The 'shuttles' that make up the system travel between fixed storage shelves to deposit or retrieve a requested load (ranging from a single book in a library system to a several ton pallet of goods in a warehouse system). The entire unit moves horizontally within an aisle, while the shuttles are able to elevate up to the necessary height to reach the load, and can extend and retract to store or retrieve loads that are several positions deep in the shelving. A semi-automated system can be achieved by utilizing only specialized shuttles within an existing rack system. Another AS/RS technology is known as shuttle technology. In this technology the horizontal movement is made by independent shuttles each operating on one level of the rack while a lift at a fixed position within the rack is responsible for the vertical movement. By using two separate machines for these two axes the shuttle technology is able to provide higher throughput rates than stacker cranes. Storage and Retrieval Machines pick up or drop off loads to the rest of the supporting transportation system at specific stations, where inbound and outbound loads are precisely positioned for proper handling. In addition, there are several types of Automated Storage & Retrieval Systems (AS/RS) devices called Unit-load AS/RS, Mini-load AS/RS, Mid-Load AS/RS, Vertical Lift Modules (VLMs), Horizontal Carousels and Vertical Carousels. These systems are used either as stand-alone units or in integrated workstations called pods or systems. These units are usually integrated with various types of pick to light systems and use either a microprocessor controller for basic usage or inventory management software. These systems are ideal for increasing space utilization up to 90%, productivity levels by 90%, accuracy to 99.9%+ levels and throughput up to 750 lines per hour/per operator or more depending on the configuration of the system. == Horizontal carousels == Robotic Inserter/Extractor devices can be used for horizontal carousels. The robotic device is positioned in the front or rear of up to three horizontal carousels tiered high. The robot grabs the tote required in the order and often replenishes at the same time to speed up throughput. The tote(s) are then delivered to a conveyor, which routes it to a work station for picking or replenishing. Up to eight transactions per minute per unit can be done. Totes or containers up to 36" x 36" x 36" can be used in a system. On a simplistic level, horizontal carousels are also often used as "rotating shelving". With simple "fetch" command, items are brought to the operator and otherwise wasted space is eliminated. AS/RS Applications: Most applications of AS/RS technology have been associated with warehousing and distribution operations. An AS/RS can also be used to store raw materials and work in process in manufacturing. Three application areas can be distinguished for AS/RS: (1) Unit load storage and handling, (2) Order picking, and (3) Work in process storage. Unit load storage and retrieval applications are represented by unit load AS/RS and deep-lane storage systems. These kinds of applications are commonly found in warehousing for finishing goods in a distribution center, rarely in manufacturing. Deep-lane systems are used in the food industry. As described above, order picking involves retrieving materials in less than full unit load quantities. Minilpass, man-on board, and items retrieval systems are used for this second application area. Work in process storage is a more recent application of automated storage technology. While it is desirable to minimize the amount of work in process, WIP is unavoidable and must be effectively managed. Automated storage systems, either automated storage/retrieval systems or carousel systems, represent an efficient way to store materials between processing steps, particularly in batch and job shop production. In high production, work in process is often carried between operations by conveyor system, which this serve both storage and transport functions. === Inventory Category-specific AS/RS === Each inventory category—raw materials, work-in-process, and finished goods—requires its own specialized Automated Storage and Retrieval System (AS/RS). Particularly for work-in-process (WIP) inventories, due to variations in manufacturing processes, the AS/RS systems are significantly different in design and function, tailored specifically to match unique handling, storage, and retrieval requirements === Installed applications === Installed applications of this technology can be wide-ranging. In some librarie

Hyperparameter optimization

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts. Hyperparameter optimization determines the set of hyperparameters that yields an optimal model which minimizes a predefined loss function on a given data set. The objective function takes a set of hyperparameters and returns the associated loss. Cross-validation is often used to estimate this generalization performance, and therefore choose the set of values for hyperparameters that maximize it. == Approaches == === Grid search === The traditional method for hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set or evaluation on a hold-out validation set. Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search. For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say C ∈ { 10 , 100 , 1000 } {\displaystyle C\in \{10,100,1000\}} γ ∈ { 0.1 , 0.2 , 0.5 , 1.0 } {\displaystyle \gamma \in \{0.1,0.2,0.5,1.0\}} Grid search then trains an SVM with each pair (C, γ) in the Cartesian product of these two sets and evaluates their performance on a held-out validation set (or by internal cross-validation on the training set, in which case multiple SVMs are trained per pair). Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure. Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because the hyperparameter settings it evaluates are typically independent of each other. === Random search === Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. A benefit over grid search is that random search can explore many more values than grid search could for continuous hyperparameters. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample. Despite its simplicity, random search remains one of the important base-lines against which to compare the performance of new hyperparameter optimization methods. === Bayesian optimization === Bayesian optimization is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run. === Gradient-based optimization === For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks. Since then, these methods have been extended to other models such as support vector machines or logistic regression. A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using automatic differentiation. A more recent work along this direction uses the implicit function theorem to calculate hypergradients and proposes a stable approximation of the inverse Hessian. The method scales to millions of hyperparameters and requires constant memory. In a different approach, a hypernetwork is trained to approximate the best response function. One of the advantages of this method is that it can handle discrete hyperparameters as well. Self-tuning networks offer a memory efficient version of this approach by choosing a compact representation for the hypernetwork. More recently, Δ-STN has improved this method further by a slight reparameterization of the hypernetwork which speeds up training. Δ-STN also yields a better approximation of the best-response Jacobian by linearizing the network in the weights, hence removing unnecessary nonlinear effects of large changes in the weights. Apart from hypernetwork approaches, gradient-based methods can be used to optimize discrete hyperparameters also by adopting a continuous relaxation of the parameters. Such methods have been extensively used for the optimization of architecture hyperparameters in neural architecture search. === Evolutionary optimization === Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses evolutionary algorithms to search the space of hyperparameters for a given algorithm. Evolutionary hyperparameter optimization follows a process inspired by the biological concept of evolution: Create an initial population of random solutions (i.e., randomly generate tuples of hyperparameters, typically 100+) Evaluate the hyperparameter tuples and acquire their fitness function (e.g., 10-fold cross-validation accuracy of the machine learning algorithm with those hyperparameters) Rank the hyperparameter tuples by their relative fitness Replace the worst-performing hyperparameter tuples with new ones generated via crossover and mutation Repeat steps 2-4 until satisfactory algorithm performance is reached or is no longer improving. Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms, automated machine learning, typical neural network and deep neural network architecture search, as well as training of the weights in deep neural networks. === Population-based === Population Based Training (PBT) learns both hyperparameter values and network weights. Multiple learning processes operate independently, using different hyperparameters. As with evolutionary methods, poorly performing models are iteratively replaced with models that adopt modified hyperparameter values and weights based on the better performers. This replacement model warm starting is the primary differentiator between PBT and other evolutionary methods. PBT thus allows the hyperparameters to evolve and eliminates the need for manual hypertuning. The process makes no assumptions regarding model architecture, loss functions or training procedures. PBT and its variants are adaptive methods: they update hyperparameters during the training of the models. On the contrary, non-adaptive methods have the sub-optimal strategy to assign a constant set of hyperparameters for the whole training. === Early stopping-based === A class of early stopping-based hyperparameter optimization algorithms is purpose-built for large search spaces of continuous and discrete hyperparameters, particularly when the computational cost to evaluate the performance of a set of hyperparameters is high. Irace implements the iterated racing algorithm, that focuses the search around the most promising configurations, using statistical tests to discard the ones that perform poorly. Another early stopping hyperparameter optimization algorithm is successive halving (SHA), which begins as a random search but periodically prunes low-performing models, thereby focusing computational resources on more promising models. Asynchronous successive halving (ASHA) further improves upon SHA's resource utilization profile by removing the need to synchronously evaluate a