Streamline Snowflake Error Logs with Real-time Notifications to Slack Channel

Introduction

Strong data management systems are essential in the digital world because data is essential to enterprises. Due to its scalability, flexibility, and usability, Snowflake, a cloud-based data warehouse system, has grown in popularity. However, just like any other system, mistakes can happen and negatively impact corporate operations.

Having a system in place to identify and alert stakeholders is crucial for reducing the effects of errors. Sending error messages to Slack users or channels is one approach to accomplishing this. Slack is a well-liked network for team communication that promotes easy cooperation, making it a great choice for error notification dissemination.

Setting up a Snowflake task to record the issue and a Slack bot to convey the message to the intended recipients is required for sending error notifications from Snowflake to Slack users or channels. Snowflake’s tasks, which allow users to plan and automate data processing workflows, can be used to automate this operation.

Setting up Slack Bot for error notification from Snowflake

The steps for configuring a Slack bot to send out error notifications are as follows:

Step 1: In Slack, create a new bot user.

In Slack, the first step is to establish a new bot user. Visit the Slack API website and log in using your Slack credentials to complete this. After logging in, select “Create a Slack app” from the menu and then follow the on-screen directions to build a new app. Following the creation of the app, you may add a new bot user by selecting “Bot users” from the “Features” part of the app setup page.

Step 2: Create an API token for the bot’s user.

In order to authenticate the bot with the Slack API, we must create an API token for the bot user. To accomplish this, select “Install App” and adhere to the on-screen directions to grant the app access to our Slack workspace. Once the app has been given permission, we can create an API token by selecting “OAuth & Permissions” from the list of options under “Features” on the app settings page. The API token should be copied and saved for further usage. Enable receiving the workspace URL via incoming webhooks as well.

Step 3: Add the bot user to Slack channels.

We can next add the bot user to the Slack channels that will receive error messages from Snowflake after creating the API token. Go to the Slack workplace and find the relevant channels there to achieve this. next look for the bot user we created earlier by selecting the “Add apps” option. Once the bot user has been located, click “Add” to add it to the channel.

Step 4: Configure Snowflake to send error notifications to Slack.

The last step is to set up Snowflake to use the bot user and API token to send error warnings to Slack. Setting up a Snowflake job that records the problem and instructs the Slack bot to send the notification will do this. Depending on the requirements for error notification, the Snowflake job can be configured to execute at a specific frequency, such as every hour or every day.

We must develop a stored procedure that searches the error log table and extracts the error details in order to configure the Snowflake task. The error message can then be sent from the stored procedure to the Slack bot, which will subsequently relay it to the chosen channels, using the Snowflake API. The bot user will be authenticated with the Slack API using the API token previously generated.

Snowflake procedures are multi-language functional, which makes it easier for developers. The procedure is implemented in JavaScript, but it can also be written in Python and Java.

The output shown below illustrates how the JavaScript code was used to access the error log data

To get error information for queries that were executed within the previous 24 hours, this stored procedure runs a query against the TASK_HISTORY table in the INFORMATION_SCHEMA. A JSON object including the query ID, error code, error message, scheduled time, next schedule time, finished time, and duration for each error is returned as the results. Through the connectors, we can ensure that the results are transferred to our desired place as a table, a sheet, or an Excel file.

This saved process can be modified to meet our unique needs for error notification, such as filtering errors based on particular error codes.

Also Read: Unlocking the Power of Data Democratization: Empowering Your Entire Organization with Access to Data

Create a Snowflake task to capture and send notifications to Slack.

Now, using our method and the Slack token we established, we will integrate this error log with Slack to alert the users. This is done by setting up a snowflake task to run every five minutes (this may be altered depending on the requirement and available credits), which will notify Slack of any issues.

To bring the API endpoint and bot token to configure our tasks in the Slack channel and integrate the notification flow, we should construct two important key components in our script. To ensure a stronger grasp on the logs, we also have a number of security measures and constraints that may be applied from both Snowflake’s and Slack’s ends. The task scheduler built into Snowflake, which manages schedule time management and smooth integration, carries out the timetable.

// set up the Slack API endpoint

var slackUrl = ‘<Our Slack bot API endpoint here>’;

// set up the Slack bot token

var slackToken = ‘<Our Slack bot token here>’;

This task, which is scheduled to run every five minutes, invokes a saved function. The stored method searches the QUERY_HISTORY_ERRORS table of the SNOWFLAKE. Use the ACCOUNT_USAGE schema to look for issues that occurred during the last five minutes. If there are problems, it creates a Slack message payload and uses the bot token and endpoint of the Slack API to deliver it to the selected Slack bot. To keep track of the number of errors that have occurred at a particular time or for a specific length of time, the messages include a counter for each error that is encountered. We may check the status of our task by calling it and using,

Show tasks like ‘task_name’ in task_location

This task and stored procedure can be modified to meet our unique error notification needs, such as by altering the error time window or the Slack message content.

Best practises for setting up error notification thresholds and escalation procedures.

Setting up error notification thresholds and escalation processes is crucial for making sure that urgent problems are dealt with and fixed right away. When establishing these procedures, keep the following recommended practises in mind:

  1. 1. Establish notification levels: Based on the severity and significance of the issue, establish clear and simple thresholds for error alerts. For instance, we might prefer to be notified of all significant errors, but only if minor errors happen more frequently than a predetermined threshold.
  2. 2. Escalation protocols: Establish escalation protocols to guarantee that urgent concerns are handled right away. If problems are not handled within a predetermined amount of time, this may entail notifying management or higher-level support teams.
  3. Frequently test our notification processes to make sure that alerts are being sent accurately and that escalation processes are working as intended.
  4. 3. Establish a procedure for prioritising and triaging issues in accordance with their seriousness and impact. By doing this, it may be possible to guarantee that urgent problems get attention first and that resources are allocated effectively.
  5. 4. Record error alerts: Watch and record error alerts to spot patterns and trends. This can assist in identifying persistent problems and guide future system upgrades.
  6. 5. Continually examine and enhance our notification protocols: We must always assess and enhance our notification protocols to make sure they are reliable and effective. This could entail streamlining notification workflows and processes, integrating new technology, or taking customer and support team comments into account.

By adhering to these recommendations, you can make sure that your error notification levels and escalation processes are trustworthy, efficient, and capable of handling urgent situations quickly.

Benefits of using Slack for error notification over email

Slack is a real-time communication platform that enables teams to cooperate and communicate effectively, therefore, it has several advantages over email in terms of alerting users. As a result, notifications are sent immediately and are readily accessible to all team members who have access to the appropriate Slack channel. Email notifications, on the other hand, run the risk of being overlooked, delayed, or lost in a busy inbox, which could have a greater negative impact on business.

Additionally, Slack offers more personalization options for notifications. Users can set up notifications to be sent in several formats, such as text, graphics, and links, which can be customised to fit certain use cases. Teams can better comprehend the failed job with the help of this flexibility, which can be important for troubleshooting and debugging.

Slack can streamline the entire incident management process because it interfaces with a broad variety of third-party applications and services, like Jira and GitHub. For instance, a Slack bot can automatically generate an incident in Jira, assign it to the proper team member, and link it to the relevant chat message when a failed job is identified. The time and effort needed to manage incidents can be greatly reduced because of this connectivity between Slack and other applications, which leads to quicker resolution times and lower operational expenses.

Slack also offers improved process visibility for incident response. Team members can quickly see who is reacting to an incident, what steps are being taken, and when the situation is addressed when notifications are given using Slack channels. This openness encourages responsibility and can assist teams in determining where their incident management procedures need to be strengthened.

The screenshots below show the inability to distinguish a few clear benefits of Slack over email. The first screenshot displays the failure-related email notification, which simply includes the bare minimum of an ID and a description. The user is additionally shown in the second screenshot being triggered and monitoring the member for a longer period of time.

Common error scenarios in Snowflake and how to handle them with Slack notification.

Although Snowflake is a strong data warehousing technology that enables effective data storage and analysis, it can have faults that have an influence on data processing and analysis, just like any complicated system. Following are some typical Snowflake fault scenarios and solutions that utilise Slack notification:

  1. 1. Query timeouts: If the query takes too long to run or if there are resource limitations, Snowflake may experience query timeouts. Slack notifications can be used to handle this mistake by notifying users or administrators that the query has timed out and informing them of the solution. We could also set up alerts to let people know when a lengthy query is active.
  2. 2. Query failures: Queries might fail for a number of reasons, including incorrect syntax or data issues. Users or administrators can be informed through Slack notice when a query has failed and given instructions on how to fix the problem. To further assist in identifying and resolving the problem, we could also want to provide thorough error messages and logs.
  3. 3. Resource limitations: If not enough resources are available to conduct a query, Snowflake may experience resource limitations. Users or administrators can be informed of resource constraints using Slack notifications, and they can be given instructions on how to allocate more resources or improve the query.
  4. 4. Data load failures: Snowflake may experience data loading difficulties if the data is incorrectly formatted or has other errors. Users or administrators can be informed through Slack notice that a data load has failed and given instructions on how to fix the problem. To further assist in identifying and resolving the problem, we could also want to provide thorough error messages and logs.
  5. 5. Data processing errors: If the data is incorrectly prepared or contains errors, Snowflake may experience data processing difficulties. Users and administrators can be informed of data processing errors and given instructions on how to fix them via Slack notifications. To help with the problem’s diagnosis and resolution, we could additionally want to provide thorough error messages and logs.

Conclusion

It’s crucial to set up error reporting processes if we’re to keep our Snowflake data warehouse reliable and accessible. We can make sure that issues are resolved quickly and that severe errors are escalated to the relevant employees by collecting error information and delivering notifications to Slack channels.

We talked about how to automate the process using Snowflake’s stored procedures and tasks, as well as how to build up a Slack bot to collect error notifications from Snowflake. Defining notification thresholds, utilising various notification channels, and routinely testing notification procedures were some of the best practises we discussed for setting up error notification thresholds and escalation procedures.

By adhering to these best practises, we can build a strong error notification system that minimises downtime while assisting you in swiftly identifying and resolving issues. Setting up issue notifications using Slack may give any data analyst, data engineer, developer, or database administrator access to a potent tool for tracking and maintaining the dependability of your Snowflake data warehouse.

Get in Touch to Supercharge your data and analytics journey with our comprehensive services.

Click here



Author: Shreyanth S
Shreyanth is a dynamic and passionate Data Engineer with a successful track record spanning 4 years. Holding an MTech degree in Data Science and Engineering from BITS Pilani, he has honed his expertise in harnessing data for actionable insights, statistical methods, machine learning, and predictive modeling. With a year of experience as a Product Architect in his repertoire, Shreyanth brings a unique blend of technical prowess and strategic vision to the table. As a vital member of the data engineering team, he plays a key role in developing, designing, implementing, and optimizing data infrastructure, including complex data pipelines, ETL processes, data warehousing solutions, and data management.