Table of Contents

Introducing The Marvelous Gemma models in Keras-24

How amazing are introduced Gemma models in Keras with a list of features and a glance at the future? let’s find wonders of new generation AI.

About Gemma models in Keras

The Keras team is excited to introduce that Gemma, a family of lightweight modern open models, is built from the same research and technology that was used to create the Gemini models. It is now proudly available In the KerasNLP collection.

Thanks to supper amazing Keras 3, Gemma runs on JAX, TensorFlow and PyTorch. Keras is also introducing several new features with this release, specifically designed for large language models like a new LoRA API, a Low-Rank Adaptation and large-scale model-parallel training abilities.

Intro

This new outclass model Gemma comes in portable 2B and 7B parameter sizes, and delivers a notable approach against similar immense pen models, as against some like

Gemma 7B, It has scored a new classic 64.3% of error-free answers in the MMLU language understanding standard (vs. 62.5% for Mistral-7B and 54.8% for Llama2-13B)

Gemma adds +11 percentage points to the GSM8K guideline score for grade-school math problems (46.4% for Gemma 7B vs. Mistral-7B 35.4%, Llama2-13B 28.7%)

And +6.1 percentage points of right answers in HumanEval, a coding test (32.3% for Gemma 7B, vs. Mistral 7B 26.2%, Llama2 13B 18.3%).

Gemma models are put forward with a familiar KerasNLP API and a super-readable Keras accomplishment. You can epitomize the model with a single line of code. See the code (1) below the content.

And run it immediately on a text prompt. Tokenization is built-in, but you can easily split it out if needed. Find (2) below the content

Enhancing Gemma Models with LoRA

Now with Keras 3 you can select the backend on which you like to run the model. how you can switch. See the (3) below the content.

Keras 3 comes with many unique features specifically for large language models. Chief among them all is a new Low-Rank Adaptation for the parameter (LoRA API)-efficient fine-tuning. Here is how to activate it. (4)

This single line reduces the number of inculcated parameters from 2.5 billion to 1.3 million.

Fine-tuning Gemma models on multiple GPU/TPUs

Keras 3 also holds large-scale model training. Gemma is the well-defined model to try it out. The new Keras distribution API presents data-parallel and model-parallel handout training options.

The new API is multi-backend but for the time being, it is carried through for the JAX hindmost part only, because of its proven scalability (Gemma models were trained with JAX).

To modify the larger Gemma 7B, an out-given setup is useful, like a TPUv3 with 8 TPU cores that you can get for free on Kaggle and an 8-GPU machine from Google Cloud. Here is how you can configure the model for allocated training, using model parallelism. (5)

What the (5) code snippet does is set up the 8 accelerators into a 1 x 8 matrix. The two dimensions are called “batch” and “model”. Model weights are shared on the “model” proportions, here split between the 8 accelerators, while data batches are not divided since the “batch” dimension is 1.

What’s More

Google will soon publish a guide showing us how to correctly partition a Transformer model and write the 6 lines of partitioning setup above.

Full GSPMD model parallelism works here with just a few shared hints because Keras converts these settings to the powerful XLA compiler. This then outputs all the other details of the divided computation.

We hope you will love playing with Gemma models and by the way, if you want to share your fine-tuned weights with the community. The Kaggle Model Hub now supports user-tuned weight uploads. Just go to the model page for Gemma models on Kaggle and find out what others have already generated

Codes

1. Gemma_lm = keras_nlp.models.

2. Gemma_lm.generate(“Keras is a”, max_length=32)

“Keras is a famous deep learning foundation for neural networks”

3. Os.environ[“KERAS_BACKEND”] = “jax” # Or “tensorflow” or “torch”.

Import keras # import keras after having selected the backend

4. Gemma_lm.backbone.enable_lora(rank=4)

# Note: rank=4 replaces the weights matrix of relevant layers with the

# product AxB of two matrices of rank 4, which reduces the number of

# trainable parameters.

5. Device_mesh = keras. distribution.DeviceMesh(

(1, 8), # Mesh topology

[“batch”, “model”], # named mesh axes

Devices=keras.distribution.list_devices() # actual accelerators)

# Model config

Layout_map = keras.distribution.LayoutMap(device_mesh)

Layout_map[“token_embedding/embeddings”] = (None, “model”)

Layout_map[“decoder_block.*attention.*(query|key|value).*kernel”] = (

None, “model”, None)

Layout_map[“decoder_block.*attention_output.*kernel”] = (

None, None, “model”)

Layout_map[“decoder_block.*ffw_gating.*kernel”] = (“model”, None)

Layout_map[“decoder_block.*ffw_linear.*kernel”] = (None, “model”)

# Set the model config and load the model

Model_parallel = keras.distribution.ModelParallel(

Device_mesh, layout_map, batch_dim_name=”batch”)

Keras.distribution.set_distribution(model_parallel)

Gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset(“gemma_7b_en”)

# Ready: you can now train with model.fit() or generate text with generate()

Introducing Amazing Gemma models in Keras-24