VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More
San Francisco-based Datasaur, an AI startup specializing in text and audio labeling for AI projects, today announced the launch of LLM Lab, a comprehensive one-stop shop to help teams build and train custom large language model applications like ChatGPT.
Available for both cloud and on-premise deployments, the Lab gives enterprises a starting point to build their internal custom generative AI applications without worrying about business and data privacy risks that often stem from third-party services. It also gives teams more control over their projects.
“We’ve built a tool that holistically addresses the most common pain points, supports rapidly evolving best practices, and applies our signature design philosophy to simplify and streamline the process. Over the past year, we have constructed and delivered custom models for our own internal use and our clients, and from that experience, we were able to create a scalable, easy-to-use LLM product,” Ivan Lee, CEO and founder of Datasaur, said in a statement.
What Datasaur LLM Lab brings to the table
Since its launch in 2019, Datasaur has helped enterprise teams execute data labeling for AI and NLP by continuously working on and evolving a comprehensive data annotation platform. Now, that work is culminating in the LLM Lab.
Event
AI Unleashed
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
Learn More
“This tool extends beyond Datasaur’s existing offerings, which primarily focus on traditional Natural Language Processing (NLP) methods like entity recognition and text classification,” Lee wrote in an email to VentureBeat. “LLMs are a powerful new evolution of LLM technology and we want to continue serving as the industry’s turnkey solution for all text, document, and audio-related AI applications.”
In its current form, the offering gives an all-in-one interface for handling different aspects of building an LLM application, right from internal data ingestion, data preparation, retrieval augmented generation (RAG), embedded model selection, and similarity search optimization to enhancing the LLM’s responses and optimizing the server costs. Lee says the whole work is executed around the principles of modularity, composability, simplicity and maintainability.
“This (approach) efficiently handles various text embeddings, vector databases and foundation models. The LLM space is constantly changing and it’s important to create a technology-agnostic platform that allows users to swap different technologies in and out as they strive to develop the best possible solution for their own use cases,” he added.
To get started with the LLM Lab, users have to pick a foundation model of choice and update the settings/configuration (temperature, maximum length, etc.) associated with it.
Among the supported models are Meta’s Llama 2, the Technology Innovation Institute in Abu Dhabi’s Falcon, and Anthropic’s Claude, as well as Pinecone for vector databases.
Next, they have to choose prompt templates to sample and test the prompts to see what works best on what they’re looking for. They can also upload documents for RAG.
Once the above steps are completed, they have to finalize the optimal configuration for quality/performance tradeoffs and deploy the application. Later, as it gets used, they can evaluate prompt/completion pairs through rating/ranking projects and add back into the model for fine-tuning/reinforcement learning via human feedback (RLHF).
Breaking technical barriers
While Lee did not share how many companies are testing the new LLM Lab, he did note that the feedback has been positive so far.
Michell Handaka, the founder and CEO of GLAIR.ai, one of the company’s customers, noted the Lab bridges communication gaps between engineering and non-engineering teams and breaks down technical barriers in developing LLM applications —enabling them to easily scale the development process.
So far, Datasaur has helped enterprises in critical sectors, such as financial, legal and healthcare, turn raw unstructured data into valuable ML datasets. Some big names currently working with the company are Qualtrics, Ontra, Consensus, LegalTech and Von Wobeser y Sierra.
“We have been able to support forward-thinking industry leaders…and are on track to 5x revenue in 2024,” Lee emphasized.
What’s next for Datasaur and its LLM Lab
In the coming year, the company plans to build up the Lab and invest more in LLM development at the enterprise level.
Users of the product will be able to save their most successful configurations and prompts and share the findings with colleagues.
The Lab will support new and up-and-coming foundation models, as well.
Overall, the product is expected to make a significant impact given the growing need for custom and privacy-focused LLM applications. In the recent LLM Survey report for 2023, nearly 62% of the respondents indicated they are using LLM apps (like ChatGPT and Github Copilot) for at least one use case such as chatbots, customer support and coding.
However, with companies restricting employees’ access to general-purpose models over privacy concerns, the focus has largely shifted towards custom internal solutions, built for privacy, security and regulatory requirements.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.