Selecting the Right Scoring Pattern for Machine Learning
According to Gartner’s 2019 CIO Survey, AI adoption by businesses grew 270% over the last four years, and over 37% of businesses have implemented AI in some facet. Businesses are adopting the technology at staggering rates, and Chief Information Officers and data scientists are facing difficult decisions regarding which speed of AI fits their business needs.
AI can be broken down into three scoring patterns: batch, event-driven, and real-time. Each scoring pattern provides different capabilities, depending on the goal of the model. For example, while batch computing may work ideally in a payroll setting, it would not be an effective way to track fraud in banking transactions.
By exploring these three methods and their potentials, companies are able to maximize their valuable insights.
1. Batch Processing
Batch processing is an effective way of handling high volumes of data for AI. Transactions are collected during a period of time, and then processed as a batch. Historically, batch processing is used for modeling predictive analytics, as the large volume of data ensures more accurate results and insights. With the development of streaming data, processing options are expanding, but batching remains the most common way to distill vast data into business strategy.
This form of data processing is often used for performing bank transactions after hours, running business reports, and billing clients at the correct interval. Batch models may also be applied to score customer loyalty, lifetime value, or segment membership with timing intervals ranging from multiple times daily, to monthly. Any task that does not require real-time data input may be appropriate for batch processing.
Finally, batch processing allows for timing-flexibility, and efficiency in cost and network energy when there is a limited number of models and data volumes. For the many scenarios that do not require constant data processing, batching may be the best option. For environments that have hundreds of models and they are applied on terabytes or gigabytes of data, it can quickly become problematic for timing and cost.
2. Event-driven Processing
Not all data is created equal. With the proliferation of IoT devices, sensors, and applications emitting bytes around the clock, data scientists are faced with the task of prioritization. The majority of data is insignificant and does not require a model to be scored; however, when an atypical event does occur, AI can kick in and administer best next steps.
An event can be considered any significant change in state; it could be purchasing a new vehicle, buying a house, having a baby, or receiving a large sum of money. On the other hand, the event could be small-scale, such as sending an email, visiting a particular website, or a device reaching a certain temperature. Whatever event the system is designed to track, when that event occurs, it allows AI to follow with best practices.
Event-driven processing has proven to be exceptionally useful for marketing, as consumers become more responsive when businesses are attuned with their day-to-day lives. This type of processing is also valuable for businesses to automate things like inventory-control, or for AI to know when someone has arrived at home or departed.
Businesses should implement event-driven processing when they can pre-identify metrics that should be linked to consistent actions. AI can both be used to identify these metrics, and to take action.
3. Real-time Processing
With the growth of digital identifiers linked to Internet browsing and generating data, data scientists strive to process data at the same rate they receive it, which is real-time processing. This method requires constant input, processing, and output. Streaming has given rise to the existence of fast data, and companies around the world are finding new value in this approach.
An example of data processed in real-time by AI is fraud reporting on credit card purchases. Within a few milliseconds, a bank must register the input of information, apply a scoring model, and determine next steps. Depending on the score of the information, the bank will either authorize the purchase or report it as fraudulent. Real-time AI processing is necessary for medical diagnosis, speech recognition, market analysis, consumer recommendations, and robots, among many others.
Real-time processing is about achieving faster results, using contextual cues, immediate analytics, and action without hesitation; however, the technology does have its limitations. The level of quality in certain circumstances is questionable. While a missing data stream may not heavily impact the accuracy of a marketing initiative, for many industrial purposes the effect could have a harmful impact. Other AI models may depend on data sources that are not available for real-time retrieval or they do not support acceptable SLAs. That of course limits their ability to be invoked on real-time. If, however, the environment includes AI models that are scored under very specific scenarios or are infrequently scored, it is often better to take a real-time or on-demand approach to these models rather than spend resources generating scores in batch that would rarely be used.
Determining Next Steps
To implement an effective AI strategy, companies must consider the parameters of their needs per use case or scenario. While batch processing historically served as the primary processing model, the outcomes made possible by event-driven and real-time processing allow AI data scientists to explore new horizons and leverage real-time, contextual data within their models.
The important thing to remember is that all of these approaches can work together. With the highest accuracy, batch-driven may be the most informed way to build AI models; event-driven processing can raise red flags when necessary and power a highly intuitive marketing campaign. Finally, real-time processing allows for cost-effective customer service and many types of personalization. A skilled AI data scientist and ML engineering team must work together to determine which type of processing is best suited for a given use case and its constraints.