Abstract: With the increasing demand for intelligent mobile applications, the deployment of large language models (LLMs) on Android devices has emerged as a significant challenge. Android platforms, being resource-constrained, require innovative solutions to effectively manage the computational and memory requirements of LLMs. This paper provides a comprehensive review of techniques such as transformer optimization, federated learning, quantization-aware training, and privacy-preserving inference for efficient LLM deployment on mobile devices. We explore the latest advancements in model compression, neural network optimization, and secure aggregation methods that enable on-device AI while ensuring performance, privacy, and user experience. This review emphasizes the trade-offs between model size, accuracy, and energy efficiency, offering practical insights into the development of next-generation AI-driven applications for Android devices.
Keywords: On-device AI, Android, large language models, federated learning, quantization-aware training, model compression, privacy-preserving inference.
| DOI: 10.17148/IARJSET.2024.111015