BlockBeats news, October 11, Chainbase, a full-chain data network, recently announced that it has open-sourced its large language model Theia-Llama-3.1-8B, which is specially designed for the crypto field, on HuggingFace. This model surpasses the mainstream market models in perplexity and BERT scores, and its ability to understand the crypto world exceeds most mainstream open-source large models.

The Chainbase team has creatively built the first professional Web3 dataset, which includes various materials of the top 2000 projects on CoinMarketCap. The dataset has been manually and algorithmically filtered to ensure the accuracy, diversity and professionalism of the training data. Based on this dataset, the team uses LoRA technology to efficiently fine-tune the model, and uses tools such as DeepSpeed to accelerate the training process. In addition, the model is quantized to Q8 GGUF format, which greatly reduces memory usage and improves inference speed.

It is reported that Theia-Llama-3.1-8B is Chainbase's initial attempt at a large model in the encryption field and the model has been successfully applied to the Chainbase DEMO interactive application TheiaChat, which currently has more than 300,000 daily active users.