At deepsense.ai, we have developed a groundbreaking solution that combines Advanced Retrieval-Augmented Generation (RAG) with Small Language Models (SLMs) to enhance the capabilities of embedded devices. SLMs, with 3 billion parameters or less, are smaller, faster, and more lightweight than traditional language models. By implementing SLMs directly on edge devices, businesses can benefit from cost reduction, improved data privacy, and offline functionality. This means significant savings by eliminating the need for cloud inference, seamless offline use, and local processing for enhanced data privacy. Our ongoing research initiatives are focused on further improving SLMs, including better hardware utilization, 1-bit LLMs for memory and inference speed benefits, mixtures of experts, and sparse kernels with pruning. We have also developed a complete RAG pipeline with SLMs capable of running on resource-constrained Android devices, addressing challenges such as memory limitations, platform independence, and the maturity of inference engines. Our tech stack includes llama.cpp for SLM inference, bert.cpp for model embedding, Faiss for efficient search, Conan for package management, and Ragas for automated RAG evaluation. For more information and free consultation, visit our AI Lab in Telegram @itinai or follow us on Twitter @itinaicom.
No comments:
Post a Comment