Hugging Face partners ServiceNow for StarCoder and StarCoderBase

Share post:

Hugging Face and ServiceNow’s BigCode partnership is making great progress in the development of large programming language models (LLMs) with the development of StarCoder and StarCoderBase, with an emphasis on ethical principles.

StarCoder and StarCoderBase were developed in collaboration with GitHub and trained on its freely licensed data set, which includes over 80 programming languages, Git commits, GitHub problems, and Jupyter notebooks.

StarCoder was trained with 1 trillion tokens and has a 8,192-token context window. It creates realistic code and works with a variety of programming languages. It is distributed under the OpenRAIL-M license, which places legal restrictions on its usage and modification. Furthermore, like other LLMs, StarCoder has the potential to generate inaccurate or biased information, and it is critical to recognize these limitations and strive toward overcoming them.

While the StarCoderBase model surpasses other open Code LLMs in numerous prominent programming benchmarks, it is on par with, if not better than, closed models like as OpenAI’s code-Cushman-001. Its context length, which exceeds 8,000 tokens, enables it to process more input than any other open LLM now available.

The researchers also disclosed OpenRAIL license of the model’s code, which includes intermediate checkpoints. Furthermore, all training and preprocessing code is released under the Apache 2.0 license. A thorough framework for testing computer programs, a new dataset for training and assessing PII-removal methods, and a tool to identify the source of the produced code inside the dataset are among the additional materials made accessible.

The sources for this piece include an article in MarkTechPost.

Featured Tech Jobs

SUBSCRIBE NOW

Related articles

Big-Box Stores moving away from self-service

A recent trend has emerged among some big-box retailers, including Canadian Tire and Walmart, as they move away...

CFPB proposes regulation of digital payments

Consumer Financial Protection Bureau (CFPB) has proposed new regulations for tech giants operating in the digital payments space. The...

MapleSEC: How Kyndryl built cyber resiliency into its new IT infrastructure

When IBM spun off its global services unit to become Kyndryl the company had an opportunity to overhaul itself, including making sure cyber resiliency was in the infrastructure. Learn how i

ECB launches two-year preparation phase for digital Euro

The European Central Bank (ECB) has announced a two-year "preparation phase" that will begin on November 1. During...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways