Skip to content

KT19/native-mm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

🌌 Native Multimodal Model (NMM)

A complete, end-to-end implementation of a Native Multimodal Model built with JAX/Flax, featuring a high-performance training pipeline and a modern React-based chat interface.


Overview

This project provides a full-stack solution for training and interacting with multimodal large language models. Unlike traditional modular approaches, this repository focuses on native multimodal integration, where images and text are processed within a unified architecture.

Key Features

  • JAX-Powered Backend: Extreme performance and scalability using JAX and Flax/Linen.
  • Full Pipeline: Includes scripts for tokenizer training, data preparation (LLaVA), pretraining, and Supervised Fine-Tuning (SFT).
  • Native Multimodal: Unified processing of text and image data.
  • Modern Chat UI: A sleek, responsive frontend built with React, Vite, and TailwindCSS for real-time interaction.

Project Structure

The repository is organized into two main components:

Component Description
native-mm/ The core machine learning codebase (JAX models, training scripts, data loaders).
chat-ui/ The React-based frontend application for interacting with the model.

Getting Started

1. Backend Setup (native-mm/)

cd native-mm
uv venv
uv pip install .

Refer to the native-mm documentation for detailed steps on:

  • Training your own tokenizer.
  • Preparing the LLaVA dataset.
  • Running Pretraining & SFT.
  • Starting the Chat Server.

2. Frontend Setup (chat-ui/)

cd chat-ui
npm install
npm run dev

The UI will be available at http://localhost:5173. Make sure the backend server (from native-mm) is running to enable chat functionality.

About

Native Multimodal Model. Jax implementation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors