Making the Best AI Accessible to All

September 5, 2022
Making the Best AI Accessible to All

From Lab to Market

We are computer engineers and scientists with decades of combined artificial intelligence (AI) and machine learning (ML) acceleration research and industry experience ranging from data science to processor design. Working together as researchers at Harvard University’s Architecture Circuits and Compilers Lab, we researched and published state-of-the-art ML systems in both cloud and edge settings. Building ML systems for the highest performance, in terms of accuracy, speed, and energy efficiency, is a painstaking process even for experienced engineers like us. In our research, we realized that other companies need a bespoke solution that can help them access acceleration technology to achieve cutting-edge production-grade performance. We decided to bring our passion for acceleration and technology from the lab to the market by forming Stochastic to build software that can make the best AI accessible for everyone. We are democratizing AI acceleration–core technology needed to build efficient AI and ML systems, which are monopolized by a few big tech companies–to level the playing field of AI.

State-of-the-Art AI Challenges

The Introduction of Transformers

One of the most exciting advancements in AI was the introduction of the Transformersarchitecture, a deep learning architecture introduced by Google in 2017 that has taken the unstructured text data analytics space by storm. When transformers were introduced, they enabled a new paradigm in deep learning development. With transformers, pre-trained models, open-sourced by companies like Google and Microsoft, could be fine-tuned on individual datasets.

Transformers provide remarkable performance on tasks such as text generation, classification, translation and more–with less engineering and data science effort than alternatives. Libraries providing high-level abstraction of these models have made them easier to access for the mass. Virtually all companies operating in text analytics are working with transformers or planning to do so. However, deploying extremely large models like transformers creates new challenges in meeting latency requirements or cost, especially when scaling these models to process growing amounts of data.

Scaling Challenges

The primary challenge of working with transformers is the compute and memory requirement of the models. Deep learning has always been known for its increasingly demanding computation costs. Transformers have continued this trend with increases in model size that require exponentially growing compute power to run. There is a scaling war with the largest global tech companies like Google, Microsoft, and Meta, releasing larger models that are more accurate on natural language understanding and processing tasks.

Figure 1. How Transformer large language models are scaling at a much faster rate (750 times every two years) than Moore’s Law - compute performance doubling every two years. [1]Figure 1. How Transformer large language models are scaling at a much faster rate (750 times every two years) than Moore’s Law - compute performance doubling every two years. [1]

Operations Challenges

Once a model is trained, deploying to production is a difficult process that involves various software components such as model monitoring and update. Figure 2 shows various software components needed to deploy the model to production.

Figure 2. Required surrounding software components for production ML systems [2,3]Figure 2. Required surrounding software components for production ML systems [2,3]

Gartner's recent research shows that only half of the AI projects that make it out of prototype make it to production [4]. There is a lack of production-grade AI pipeline tools that prevents many companies from scaling their AI projects


We built Stochastic to solve these challenges so companies can scale their machine learning models effectively and cost-efficiently. We help companies spend less time figuring out how to optimize their models and reduce expenses, so they can spend more time with their models in production. We designed our products with machine learning experts and data scientists in mind, with the focus of solving common deployment and model optimization issues.


At Stochastic, we believe everyone should have access to the best artificial intelligence. We build effortless generative AI optimization and deployment to enable anyone to ship state-of-the-art AI models with production-grade performance. Our journey and products reflect our vision in increasing access to the best machine learning models.


Our flagship artificial intelligence acceleration platform, Stochastic X, automates the time-consuming and expensive process of model compression and optimization for ready-to-deploy or already deployed models. Sign up to access Stochastic X here.

For more information, contact us at


  1. Gholami A, Yao Z, Kim S, Mahoney MW, Keutzer K. AI and Memory Wall. RiseLab Medium Blog Post, University of Califonia Berkeley, 2021, March 29.


  3. Sculley D. et al. Hidden Technical Debt in Machine Learning Systems. NeurIPS 2015.