Your Azure solution is nearing production. The final thing you need to solve is how to monitor your app performance and availability, and how to get notified of potential problems. This is a quickstart to monitoring on Azure.
In a nutshell
The below image shows the monitoring and alerting pipeline used on Azure. The similar structure exists in most of the monitoring tools I’ve seen today – the component names tend to change by manufacturer.
You read the image from left to right. The information flows and gets refined on each step in the pipeline – from the raw log files into rightmost insights and actions. Let’s walk through the pipeline step by step.
The log files originate from source systems. On traditional on-premise systems you had direct access to log-producing (virtual)machines. You could log on, and view/download the log files or windows event log or syslog as you needed. If you use virtual machines on Azure you can still of course do this. But for modern PaaS services like SQL Database, Web App, Logic App or Functions you don’t usually have this option. In short, the centralized log collection on Azure happens like this:
These logs are produced directly by the Azure resource. On virtual machines the default logs contains things like CPU utilization, disk i/o, network ingress/egress, memory etc. On PaaS services, the log content varies by resource type: SQL database metrics track db locks, sql errors and such, Web Apps track response times, http status codes etc.
Collecting default logs is usually super easy: Just flick on the “Collect logs to centralized log storage” switch on the resource’s logging settings. You also need to tell which logger instance you want to use to store the logs, as you can have multiple instances of these loggers if you wish.
If your application runs on Azure virtual machine, and produces custom logs you’d like to store, you’ll configure a custom log collector. Examples of such could be a vm running Apache, Nginx, JBoss or Tomcat, which produce access-, stdout- and stderr logs you may want to store and analyse.
You can also send log data from pretty much anywhere: On-premise virtual machines, VMWare, containers, or even use REST apis to send log data. This is though out of scope for this article.
Centralized log storage
The very heart of the system. The storage resides on Azure cloud, and it stores all the incoming log data from different log streams. At the moment, the storage comes in two different flavors: Log Analytics and App Insights. Each Azure resource can write it’s log to one of these storages.
Log Analytics typically stores data on how my hardware performs, or what happens on my operating system. This is the kind of data you have collected for years from on-premise systems: CPU utilization, I/O rates, Azure AD audit logs, windows syslogs, event logs and such.
App Insights stores application related data, and is used mainly by PaaS-services. The data can be split further to 1) automatically collected performance data like response times, error codes the application gives and 2) application metrics. App metrics enable both profiling the code execution, and tracking business events.
The only difference between Log Analytics and App Insights is the schema ( = storage structure) in which the log information is stored. Most Log Analytics data can be found under LogManagement as App Insights uses schema better fit for monitoring PaaS applications. Users can’t alter this schema, or decide on which branch the log is written – this is defined by Microsoft.
Image: Log Analytics and App Insights schemas on top level
Now that we got all our logs nicely stored on single location, it’s time to start digging into the contents. The tool we use for querying the content is the Log Query. You can access this tool from App Insights by clicking the Monitoring/Logs (Analytics) and Log Analytics General/Logs icon. Both take you to same query tool.
Image: How to find the log query tool
The basic use for queries is to see the current state of your system. These are called AdHoc queries: You run the query to find out about specific aspect of the system you need to know right now. You use an SQL like Kusto – language to write these queries. To help you get started with Kusto, the tool has number of examples to get your going.
You can save the query either to 1) reuse it later, or to 2) publish the result into a dashboard or to 3) create an alert based on the query result.
Like in SQL, the query results can be displayed as a list (like latest SecurityEvent details during last hour) or a graph (bar chart showing number of exceptions per hour during last day) or a single number (number of exceptions during last 10 minutes).
You can publish any of these query results to a dashboard. To do this, save the query and click “Pin to dashboard” on top menu. Or you can create an alert based on the results by clicking the top menu “+ New Alert” button.
Insights and actions
Dashboards are used for insights on how system has operated over given period of time. You create a dashboard (or multiple dashboards for different aspects of the system) to monitor the runtime behaviour of the system. When something is about to fail (like response times are rising over time during increased load), a good dashboard usually provides a pre-warning and a clue (or even a direct link) where to start digging into details.
Image: Dashboard example
Of course – nobody needs to spend his/her time staring the dashboard lines go up or down. That’s why we have alerts. You may create an alert rule that triggers when number of exceptions thrown from your web app is more than 5 per hour. Or a rule to trigger when average response time of the web app is more than 5 seconds over last 5 minutes.
When you create an alert you state what happens when this alert triggers – the Action part of the system. You can send an email to recipient group, or you can set restoring actions to happen automatically by using web hooks. When the load gets too high – call automation script that adds another instance to handle the increased load. Or when things get too slow on your virtual machine – reboot it with automation.
A friendly reminder: If you follow good devops practices, the monitoring shouldn’t of course be left as the final stage of application building. Performance, smoke & security tests etc. are added as the system is being built, not as the final stage.
For more information
This was a short introduction to monitoring on Azure. To make things confusing enough, note that Microsoft is currently renaming the monitoringsphere to “Azure Monitor”. This name will most likely be used to reference both Log Analytics and App Insights in the future. But at the moment: Inside the Azure Monitor you’ll find the good old Log Analytics and App Insights.
That’s pretty much the basics. Enjoy!