Challenge and Background:
The main challenge was to create a system capable of offering comprehensive monitoring across various data sources, cloud services and private servers, including message queues in SQS, Power MTA, AWS SES, SMS integrators and visualization of critical automations, among other services. The system needed to not only track the performance and efficiency of these services, but also ensure high security and operational continuity for enterprise customers.
Originally, the system was intended for our internal monitoring. However, when identifying a recurring problem of a client, an insurance company, which had errors in the automatic uploads of contact lists affecting the execution of its automations, we decided to adapt the scope of our solution. We created a specific tenant account for this client, allowing them to exclusively monitor their own processes. This tailored approach opened up a significant opportunity for us: the implementation of personalized, high-value-added monitoring dashboards for other enterprise clients as well. Among the critical processes managed for these clients, there are critical processes such as the verification of insurance policies for emergency admissions in clinics, document generation for the sale of insurance policies and OTP for banking transactions, all vital and with an economic impact. considerable for our business clients.
Implemented Solution:
The system allows the Security and Operational Efficiency teams of organizations to react proactively to anomalous situations in the business processes they maintain on the DANAconnect platform. This results in a significant reduction in downtime and ensuring optimal operation. Additionally, the monitoring system takes advantage of logging and log analysis tools, centralizing data collection and simplifying the problem diagnosis and resolution process.
DANAconnect took a hybrid-cloud approach to building its multi-tenant monitoring system, integrating cutting-edge tools and cloud services:
- AWS CloudWatch: Used for real-time data and metrics collection, providing detailed tracking of critical activities in services such as AWS S3 and SQS.
- Prometheus: This open source monitoring system was crucial to manage and analyze logs related to SMS processing and other services.
- Amazon RDS: Offered the ability to perform queries in real time, ensuring the accuracy and integrity of the data.
- OpenSearch: Enabled efficient extraction and analysis of SMTP and API transaction logs, facilitating detailed analysis and generation of useful reports.
- Fluent Bit: It was implemented for efficient extraction of logs from multiple applications, including Docker containers and Kubernetes pods on AWS EKS, centralizing this information in OpenSearch.
- Grafana: Used as a friendly user interface for displaying data from various sources. Grafana provided intuitive, real-time dashboards, facilitating the interpretation of complex data through graphical representations such as pie charts, time series, bar charts, tables and histograms.
Results:
DANAconnect’s advanced monitoring dashboard has proven to be an innovative and effective solution. Enables real-time visualization through interactive dashboards, including pie charts, time series, bar charts, tables, and histograms. This tool provides DANAconnect and its financial sector clients with detailed insight and improved operational control.
Impact on Clients:
The senior executives of technology in financial companies have found in this system an invaluable tool for the continuous supervision and proactive management of their operations. Automatic alerts and log analysis capabilities have significantly improved the ability to react to incidents, reducing downtime and ensuring optimal service.
This case study highlights how DANAconnect, through the strategic integration of advanced technologies and the use of AWS CloudWatch, has managed to create a monitoring system that not only meets the security and efficiency demands of enterprise financial companies, but also provides an intuitive and highly functional platform for operational management and data-driven decision making.
Lessons Learned:
Flexibility and Adaptation: Initially designed for internal use, the system’s adaptability to client-specific needs illustrated the importance of flexible and scalable solutions. This underscores the value of designing systems that can easily be customized for different use cases.
Client-Centric Approach: The development of tenant-specific accounts highlighted the significance of a client-focused approach. Tailoring solutions to meet unique client challenges can lead to more effective and appreciated services.
Cross-Functional Collaboration: The project’s success was also a result of effective collaboration between different teams, such as development, operations, and customer service. This highlighted the importance of cross-functional teamwork in tackling complex technical challenges and delivering comprehensive solutions.
Continuous Improvement and Feedback Loop: Regular feedback from clients after implementing the tenant-specific dashboards was crucial for continuous improvement. This iterative process of receiving and acting on feedback ensured that the system consistently met and exceeded client expectations.
Risk Management and Proactive Problem Solving: The experience also taught the importance of anticipating potential problems and proactively developing solutions. This proactive mindset helped in quickly adapting the system for client-specific needs, preventing larger issues down the line.
Market Opportunity Identification: Finally, the project was a lesson in recognizing and seizing market opportunities. The shift from an internal tool to a client-facing solution opened new business avenues, highlighting the importance of being alert to market needs and responsive to potential business opportunities.