UAE’s G42 Unveils ‘Jais’, A Powerful Open-Source Arabic AI Model
G42, the Abu Dhabi-based tech conglomerate focusing on AI announced the launch of a novel Arabic Large Learning Language Model (LLM) open-sourced by AI. The collaboration took place with the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). The partnership between MBZUAI and Cerebras Systems produced the name “JAIS,” inspired by the highest peak in the UAE. A large collection of data blending Arabic and English, including some from computer code, was used to create the 13 billion parameters that make up the fresh native LLM model. It is also trained on datasets with 395 billion tokens in Arabic and English. It intends to educate 400 million Arabic-speaking people about AI’s potential. The dataset comprises 279 billion English word tokens to improve performance across languages and 116 billion Arabic tokens that reflect the diversity of the Arabic language.
Today, we are excited to announce the launch of Jais, the world’s highest quality open-source Arabic large language model (#LLM) and a collaboration between Inception, a G42 company,@MBZUAI & @CerebrasSystems. pic.twitter.com/fsMV4SHp1g
— G42 (@G42ai) August 30, 2023
How did Jais come into existence?
Cerebras Systems developed the new language model using supercomputers. It produces chips the size of dinner plates that compete with NVIDIA’s powerful AI gear. Due to the current shortage of NVIDIA processors, businesses worldwide are looking for alternatives. G42 has a subsidiary called Inception. Due to the dearth of Arabic data, the team trained the Jais model’s capacity to reason with computer code using only English language data. To give exclusive access for testing purposes, Inception and MBZUAI formed an academic relationship with several institutions. Academic collaborators include Carnegie Mellon University, The University of Edinburgh, Sorbonne Paris Nord-LIPN, NYU Abu Dhabi’s CAMeL Lab, and others. The initiative was started by a group that included academics and engineers since they claimed there weren’t many bilingual LLMs.
The Condor Galaxy (CG-1) supercomputer owned by Cerebras served as the model’s training ground. CG-1 is a part of a network of nine connected supercomputers. G42 and Cerebras Systems presented the same in July. Their supercomputers present a novel method for AI computation that promises to drastically shorten the time required for model training. Customers can benefit from the efficiency of an AI supercomputer using the CG-1 cloud service from Cerebras and G42 without having to oversee or deploy models across physical systems. Cerebras stated this year that it had sold three units to G42. The first will be delivered in the coming months and the remaining two in 2024.
How will Jais be used?
Jais will speed up innovation and strengthen Abu Dhabi’s reputation as a center for AI research, cultural preservation, and international cooperation. Through the open sourcing of Jais, Inception hopes to involve the academic, research, and developer communities. This will be done to hasten the development of a thriving Arabic language AI ecology. The language performs far better than current Arabic models and is equally viable with English models of comparable size. Other underrepresented languages in mainstream AI can use it as a model.
Several institutions will also use Jais. They include the UAE Ministry of Foreign Affairs, the UAE Ministry of Industry and Advanced Technology, the Abu Dhabi Department of Health, Etihad Airways, the First Abu Dhabi Bank (FAB), and e&. Currently, users can download Jais via Hugging Face. After registering their interest on the Jais website, users can try Jais online. They can also receive an invitation to enter the outdoor setting.
Challenges facing the new LLM
The scarcity of high-quality Arabic language data available, as compared to English, presented one of the difficulties in training the model. By using media, social media, and code, Jais uses both the widely accepted current standard Arabic and the various spoken dialects of the Middle East.
Additionally, Gulf monarchies’ desire to lead in AI has raised concerns about elites abusing the technology. The most cutting-edge LLMs available right now, such as GPT-4, which drives OpenAI’s ChatGPT, Google’s PaLM, which powers its Bard chatbot, and Meta’s open-source model LLaMA, are all capable of comprehending and producing text in Arabic. According to Andrew Jackson, the Arabic components are watered down in current models that operate in up to 100 languages.
Jais beats Falcon as well as open-source models like LLaMA, which have previously defined Arabic accuracy. Falcon’s software, according to its creators, has not been preconditioned in Arabic. The Jais model also takes a closer look at the culture and setting of the region, as opposed to most US-centric models. The model underwent rigorous testing to screen out any potentially dangerous, delicate, offending, or illicit material that did not adhere to the organization’s principles.
Here’s what they have to say
Andrew Jackson, CEO of Inception said,
We believe that innovation thrives when we collaborate. With this release, we are setting a new standard for AI advancement in the Middle East and ensuring that the Arabic language, with its depth and heritage, finds its voice within the AI landscape. Jais is a testament to our commitment to excellence and our dedication to democratizing AI and promoting innovation.
He further said,
The UAE has been a pioneer in this space (AI), we are ahead of the game, hopefully. We see this as a global race. Most LLMs are English-focused. Arabic is one of the largest languages in the world. Why shouldn’t the Arabic-speaking community have an LLM?
MBZUAI President and University Professor Eric Xing said,
Developing such a high-caliber Arabic LLM demanded cutting-edge AI research in addition to an in-depth and nuanced understanding of the Arabic language, its diversity and heritage, and the growing importance of LLMs across all echelons of society. Thanks to our research and partnerships with Inception and other top regional and global organizations, MBZUAI will continue pioneering LLMs that are efficient, effective, and accurate.
Andrew Feldman, co-founder and CEO of Cerebras Systems further added,
Our strategic partnership with G42 is already delivering pioneering results. A few weeks ago, we introduced the first multi-exaFLOP AI supercomputer, Condor Galaxy 1 (CG-1). Now, the partnership delivers another key breakthrough: the leading Arabic LLM for the open-source community. At Cerebras our passion is building groundbreaking technology. One of the great rewards is seeing the innovative ways it is used. Jais is a significant contribution to the international open-source community. It is also a testament to how incredibly easy CG-1 is to use and how it enables extremely rapid AI model development.