This blog is co-authored by Karen Yuan, a High School Intern

In our previous article, we learned to do exploratory data analysis using the Couchbase Query service. We also learned to efficiently read training data with the Query APIs in the Couchbase Python SDK and seamlessly save it to a pandas dataframe suitable for machine learning (ML). And finally, we stored ML models and their metadata in Couchbase. In this article, we will learn how to make predictions, store them in Couchbase and use the Query charts to analyze them.

Real-Time Prediction

The data scientist uses the trained model to generate predictions.

We will use the prediction flow in Figure 1 to predict the churn score in real-time and store the prediction in Couchbase. We will use the churn prediction model we trained in the previous article.

Real Time Prediction Flow

Function to read model and its metadata stored on Couchbase:

Function to read customer data stored on Couchbase:

The following predict function reads the model, its metadata and customer records using the above functions. It converts the customer data into features using the same process as the one used during training (i.e., one-hot encoding). It then predicts the churn score by running the model on the features.

Churn prediction for customerID 100002 is 1. This indicates that they are likely to leave the streaming service.

churn prediction

The prediction is saved in a Couchbase bucket called predictions using the code shown below. Create the predictions bucket on your Couchbase cluster before proceeding.

Verify that the prediction was successfully saved in Couchbase.

You can also run the trained model and generate predictions in Couchbase Analytics using the Python UDF feature (currently in developer preview). Refer to the article on running ML models using Couchbase Analytics Python UDF for more information.

What-if Analysis

The data scientist will analyze the predictions to answer questions that help make decisions.

The problem we defined in the previous article was a sales team at the online streaming service company wanting to know whether increasing the monthly cost will maximize the revenue while keeping the customer churn in check.

To answer this, we will use the code below to predict the churn scores when the monthly costs are increased by $1, $2, etc. Results of this analysis will be stored in the predictions bucket.
Using the Couchbase cluster UI, create a scope called what_if_analysis and collection called increase_monthly_cost in the predictions bucket. (Scopes and collections are available in Couchbase Server 7.0 and later)

To analyze the prediction results using Couchbase Query, create a primary index on the what_if_analysis scope as shown in the Query UI below. Note that the query context should be set as shown.

query editor
Query charts can be used to analyze the prediction results. The chart below shows that ~7% of existing customers are predicted to churn if their monthly cost is increased by $1, ~10% will likely churn if the monthly cost is increased by $2, etc.

query chart

The chart below shows that the current monthly revenue is $3.15 million. This revenue is predicted to increase by ~$50K if the monthly subscription cost of existing customers is increased by $1 and by ~$230k if the monthly cost is increased by $2. But the revenue is predicted to dip if the monthly cost is increased by $3 or more because of the higher predicted churn rate.

querychart2
Using this analysis, we can conclude that the sales team at the online streaming service company can increase the monthly subscription cost by $2 to maximize the revenue while keeping the churn rate in check.

The “Download chart” option in the Query UI can be used to save the charts.

Couchbase Analytics Service

We used the Couchbase Query API in the Python SDK to read data from Couchbase. If you want to use the Couchbase Analytics API instead, then here is an example to read the data from Couchbase and store it in a pandas dataframe.

The Couchbase Analytics service can also be used for EDA, data visualization and to run trained ML models (developer preview). Refer to the N1QL for Analytics Language Reference and the article on running ML models using Couchbase Analytics Python UDF for more information.

Conclusion

In this and the previous article, we learned how Couchbase makes data science easy. Using customer churn prediction as an example, we saw how to perform exploratory analysis using the Query service, how to efficiently read big training datasets using the Python SDK and easily store it in a data structure suitable for ML.

We also saw how to store ML models, their metadata and predictions in Couchbase and how to use the Query charts for analyzing predictions.

The Couchbase Data Platform can be used to store raw data, features, ML models, their metadata and predictions on the same cluster as the one running Query and Analytics services. This makes the process fast and easy by reducing the number of tools needed for data science.

Next Steps

If you’re interested in learning more about machine learning and Couchbase, here are some great next steps and resources to get you started:

Author

Posted by Poonam Dhavale, Principal Software Engineer

Poonam Dhavale is a Principal Software Engineer at Couchbase. She has over 20 years of experience in design and development of distributed systems, NoSQL, high availability, and storage technologies. She holds multiple patents in distributed storage systems and holds certifications in machine learning and data science.

Leave a reply