Databricks releases Dolly 2.0 for commercial use

Share post:

Dolly 2.0, a text-generating AI model that can power chatbots, text summarizers, and rudimentary search engines, has been launched by Databricks, a data analytics business. It is the follow-up to the original Dolly, which was published in March.

The new model is commercially licensed, making it available to both independent developers and businesses. Ali Ghodsi, CEO of Databricks, stated that the introduction of Dolly 2.0 was prompted by the market’s desire for more open and transparent large language models (LLMs). The business intends to inspire other companies to use their private data sets to create, train, and own AI-powered chatbots and other productivity tools.

Dolly 2.0 is based on the EleutherAI Pythia model family, and is a 12B parameter language model fine-tuned exclusively on a new, high-quality human-generated instruction following dataset called databricks-dolly-15k. To train the model, Databricks created a dataset with 15,000 records generated by thousands of Databricks employees who volunteered. This dataset was used to guide an open-source text-generating model called GPT-J-6B, provided by the nonprofit research group EleutherAI, to follow instructions in a chatbot-like fashion.

Databricks is open sourcing the entirety of Dolly 2.0, including the training code, the dataset, and the model weights, all suitable for commercial use. This means that any organization can create, own, and customize powerful LLMs that can talk to people, without paying for API access or sharing data with third parties.

The databricks-dolly-15k dataset contains 15,000 high-quality human-generated prompt/response pairs designed specifically for instruction tuning large language models. Under the licensing terms, anyone can use, modify, or extend this dataset for any purpose, including commercial applications.

The dataset was authored by more than 5,000 Databricks employees during March and April of 2023, and the training records are natural, expressive, and designed to represent a wide range of behaviors, from brainstorming and content generation to information extraction and summarization.

The sources for this piece include articles in TechCrunch and Databricks.

Featured Tech Jobs

SUBSCRIBE NOW

Related articles

Big-Box Stores moving away from self-service

A recent trend has emerged among some big-box retailers, including Canadian Tire and Walmart, as they move away...

CFPB proposes regulation of digital payments

Consumer Financial Protection Bureau (CFPB) has proposed new regulations for tech giants operating in the digital payments space. The...

MapleSEC: How Kyndryl built cyber resiliency into its new IT infrastructure

When IBM spun off its global services unit to become Kyndryl the company had an opportunity to overhaul itself, including making sure cyber resiliency was in the infrastructure. Learn how i

ECB launches two-year preparation phase for digital Euro

The European Central Bank (ECB) has announced a two-year "preparation phase" that will begin on November 1. During...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways