CoinVoice has recently learned that the full-chain data network Chainbase announced that it will open source its large language model Theia-Llama-3.1-8B designed specifically for the crypto field on HuggingFace.
The Chainbase team has creatively built the first professional Web3 dataset, which includes various data from the top 2000 projects on CoinMarketCap. The dataset has been manually and algorithmically filtered to ensure the accuracy, diversity, and professionalism of the training data. Based on this dataset, the team uses LoRA technology to efficiently fine-tune the model and uses tools such as DeepSpeed to accelerate the training process. The model is quantized to Q8 GGUF format, which greatly reduces memory usage and improves inference speed.
It is reported that Theia-Llama-3.1-8B is Chainbase's initial attempt at a large model in the encryption field and the model has been successfully applied to the Chainbase DEMO interactive application TheiaChat, which currently has more than 300,000 daily active users. [Original link]