Predictive Scaling and Load Balancing for Kubernetes-Based Microservices

Suleymanli, Emil

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

dc.contributor.author	Suleymanli, Emil
dc.date.accessioned	2025-10-28T08:26:31Z
dc.date.available	2025-10-28T08:26:31Z
dc.date.issued	2025-04
dc.identifier.uri	http://hdl.handle.net/20.500.12181/1510
dc.description.abstract	Kubernetes has emerged as the standard platform for managing microservices at scale, offering robust orchestration capabilities. However, ensuring optimal performance under dynamic and often predictable workload fluctuations remains a significant challenge. Traditional autoscaling mechanisms, such as the Horizontal Pod Autoscaler (HPA) rely on reactive policies that adjust resources based on current metrics like CPU utilization. While effective in many cases, reactive scaling often lags behind sudden traffic surges, leading to temporary service degradation or resource inefficiency. This thesis addresses these limitations by proposing a predictive autoscaling framework for Kubernetes-based microservices that integrates machine learning-based forecasting with intelligent load balancing. The proposed solution leverages a Long Short-Term Memory (LSTM) neural network trained on twelve months of real-world microservice load data. The model forecasts short-term workload trends, enabling the system to proactively adjust pod counts before demand peaks occur. In parallel, a custom load balancing mechanism was developed to distribute traffic more efficiently based on runtime pod metrics such as CPU usage and response time, ensuring that scaled-out resources are utilized effectively. An experimental Kubernetes cluster was set up to evaluate the predictive scaling approach against the standard HPA under realistic load patterns, including sharp end-of-month traffic surges observed in the banking sector of Azerbaijan. Results show that the predictive autoscaler achieved a mean absolute percentage error (MAPE) under 10% during normal periods and around 12–15% during peak salary-day surges. Compared to HPA, the predictive system reduced 95th percentile response times by up to 37% during load spikes, maintained full throughput without request drops, and triggered fewer, better-timed scaling actions. CPU utilization stayed within safer bounds, avoiding the saturation seen under reactive scaling. This work demonstrates that predictive autoscaling can significantly enhance the resilience and efficiency of Kubernetes-managed microservices. By combining accurate load forecasting with intelligent traffic distribution, the system improves user experience and infrastructure utilization. While challenges such as prediction errors and model retraining remain, the results highlight the practical benefits of integrating machine learning into cloud-native scaling strategies. Future work can extend this approach by exploring hybrid models that combine predictive insights with reinforcement learning or by refining load balancing strategies to optimize service quality during unpredictable demand fluctuations further.	en_US
dc.language.iso	en	en_US
dc.publisher	ADA University	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Kubernetes (Computer software).	en_US
dc.subject	Cloud computing -- Resource allocation.	en_US
dc.subject	Microservices (Computer architecture) -- Management.	en_US
dc.subject	Machine learning -- Applications in systems performance.	en_US
dc.subject	Load balancing (Computers) -- Performance optimization.	en_US
dc.subject	Azerbaijan -- Banking sector -- Data analysis.	en_US
dc.title	Predictive Scaling and Load Balancing for Kubernetes-Based Microservices	en_US
dc.type	Thesis	en_US