Why and How We Use Real-Time Machine Learning to Monitor Fraudulent Activity at Binance

Main TakeawaysBinance uses machine learning models to monitor suspicious activity on the platform.
An issue known as stale features can negatively impact the performance of such models, causing them to make inaccurate predictions or classifications. 
Our streaming pipeline – or the process of continuously feeding the model with real-time data – consists of two parts: data processing and data serving.
Data processing is further divided into three categories: Stream computing, ingesting, and sinking. 
Learn how we use machine learning to safeguard the Binance ecosystem.
Our risk AI team, made up of machine learning engineers and data scientists, works around the clock to combat fraud and protect Binance users. To achieve this, they use AI-driven solutions that can identify and respond to potential threats, such as peer-to-peer (P2P) scams, theft of payment details, and account takeover attacks (ATO), to name a few. 
In this article, we’ll explain how our risk AI team uses a streaming pipeline to ensure real-time responses in addition to what goes on behind the scenes. 
If you’re unfamiliar with machine learning, we recommend reading the following section to get a basic overview of some of the terms that we’ll use throughout this article. 
Batch and StreamingMachine learning engineers typically use two types of pipelines: batch and streaming. Both have their pros and cons, depending on what the situation requires. 
Batch pipelines, as the name implies, process data in batches. Engineers typically use them to process large volumes of data.
On the other hand, streaming pipelines process data in real time as they’re collected. This makes them ideal for situations that require a near-instant response; for example, detecting a hacker before they can withdraw funds from a stolen account. 
Note that both pipelines are equally important. Streaming pipelines are excellent for providing real-time responses, while batch pipelines are better at handling large volumes of data. 
In the case of fraud prevention, we need to prioritize real-time data to avoid a situation called “model staleness," which refers to machine learning models becoming outdated or inaccurate. 
The Significance of StalenessJust like how people can become less effective at a task if they don't stay updated with the latest information or techniques, machine learning models can also become less accurate if they're not regularly updated according to the situation.
One thing you don’t want is a model designed to prevent fraud going stale. The consequences range from the model incorrectly labeling a legitimate transaction as fraudulent or failing to identify a compromised account. Hence, we use a streaming pipeline to ensure fraud prevention models are working with real-time data. 
Computing for an Account Takeover (ATO) modelLet’s take an example from our ATO model, which we train to identify accounts that criminals have hijacked with malicious intent. One of the features this model measures is the number of transactions a specific customer has made in the past minute. 
Hackers tend to follow a sequential pattern, engaging in a high number of operations, like withdrawals, over a short period. Our system must calculate this feature as soon as possible in case of potential threats. This means minimizing delays between when the user takes action and when this user’s activity data is processed by our models. Just a few seconds can be the difference between stopping a hacker and a user losing all their money. 
For more information on how delayed features affect model performance, you can refer to this article by LinkedIn Engineering: Near real-time features for near real-time personalization.
The role of batch computingNote that the importance of feature staleness can depend on the model or feature being used. Some features, for instance, are relatively stable. In the ATO case mentioned above, we would also need to retrieve data on the user’s withdrawals in the past 30 days to calculate a ratio based on their most recent transactions. 
In this case, batch computing over longer time periods, such as daily or hourly intervals, is acceptable despite the higher staleness resulting from waiting for data to arrive in data warehouses and for batch jobs to run periodically. 
Balancing Freshness and LatencyUltimately, the choice between batch and streaming pipelines should be made based on the specific requirements of the use case and the features in question. Carefully considering these factors allows us to build effective fraud prevention systems that protect our users.
Using a streaming pipeline allows us to prioritize freshness over latency for time-sensitive features. The diagram above illustrates this need, as the count of operations for retrieving features should be three instead of two. 
This is why a real-time machine learning pipeline is crucial to our risk team’s day-to-day operations. 
Breaking Down the Streaming PipelineReal-time machine learning for the Binance risk AI team primarily consists of two parts: 
Data processing (upper portion of the diagram)
Data serving (lower portion of the diagram)
Data ProcessingRegarding data processing, we can divide our streaming pipeline (Flink Job) into three categories based on their responsibilities: 
Stream computing: feature engineering
Stream ingestion: feature ingestion
Stream sinking: data enrichment 
Stream computingThe stream computing component of the pipeline is responsible for near-real-time feature engineering, the process of extracting features from raw data.
It pre-computes features that our machine learning models will use for online prediction. There are two types of computing methods for the streaming computing pipeline: time-based and event-based. 
Time-based. Calculating the number of transactions every 10 minutes. This introduces some staleness but lowers latency. 
Event-based. Calculating features based on the arriving event. This lowers staleness but slightly increases latency.
Our preference is not to use real-time computation as much as possible, and here's why:
There is a trade-off between latency and staleness. Computing features as online requests arrive limits computing logic to lightweight approaches. While this method lowers staleness, the feature computation increases prediction latency.
Scaling independently is challenging since the prediction and computing services depend on each other.
On-demand computing based on request traffic creates unpredictable scaling pressure.
Real-time computation is not adaptable to our model monitoring (training-serving skew) and feature monitoring solutions since the features are not stored in a central database, i.e., a feature store.
Stream ingestionThe stream ingestion component is responsible for the near-real-time ingestion of features into our feature store from the machine learning platform Kafka. Feature stores are centralized databases that house commonly used features. They play an essential role in machine learning pipelines. You can learn more about them in the following articles: A Closer Look at Our Machine Learning Feature Store and Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline.
Stream sinkThe stream sink component is mainly responsible for sinking real-time events into a specific destination — such as highly distributed file systems (HDFS) like S3 or other external databases such as ElasticSearch — depending on project requirements.
For our risk AI team, there are generally two data enrichment patterns that can be applied to real-time data in Kafka depending on the use case: 
Static data. For example, retrieving a list of popular makers sitting in S3 for P2P-related business projects in Flink jobs. The reference data is static and only requires updating less than once a month. 
Dynamic data. Real-time exchange rates (BTC to USD), for example, are obtained from external databases like Redis. Per-record lookup ensures low latency and high accuracy if reference data changes.
Data ServingThe data-serving component of the pipeline is responsible for online prediction and batch computing. 
Online prediction. This occurs when requests come through the Decision Hub (our risk team’s internal rule engine). The relevant service will then call the feature store to retrieve features and send them to the machine-learning model for scoring. Our Risk AI team has over 20 machine-learning models designed to handle different business requirements.
Batch computing. Although it can introduce a delay of up to a few days, it plays an important role here since it complements the features computed in real time. 
Closing ThoughtsIt’s important to note that the crypto market operates 24/7, unlike traditional financial markets that have opening and closing times. Every second, there is a continuous influx of new data (withdrawals, deposits, trades, etc.) that requires us to be on the lookout for bad actors trying to steal user funds or personal information. 
Our risk AI team has worked tirelessly to develop and maintain a sophisticated AI system that can effectively flag suspicious activity. Thanks to their efforts, we can work quickly to protect compromised Binance accounts from potential losses or mitigate the damage as much as possible. 
Stay tuned for more insights into our machine-learning efforts, or check out some of our previous articles below. Interested in a machine learning career at Binance? Check out Binance Engineering on our careers page for open job postings. 
Further ReadingA Closer Look at Our Machine Learning Feature Store 
Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline
A Feature Engineering Case Study in Consistency and Fraud Detection