Setup & Scaling Best Practices

Full-Stack Observability on Heroku with the Datadog Add-On

Gain complete visibility into your Heroku app's health with Datadog's observability features, ensuring smooth performance and effective monitoring.

Want to keep your Heroku app running smoothly? Datadog’s add-on for Heroku gives you full visibility into your app’s health - backend, frontend, and everything in between.

Here’s what you’ll get:

Complete Monitoring: Track infrastructure, app performance, logs, and user experience in one place.
Easy Setup: Add the Datadog buildpack to your Heroku app or configure it for Docker-based setups.
Key Metrics & Alerts: Monitor CPU, memory, response times, and more. Set alerts for issues before they impact users.
Cost Control: Use features like log sampling and metric filtering to manage monitoring expenses.

Quick Setup Steps:

Install the Datadog buildpack via Heroku CLI.
Configure critical environment variables like DD_API_KEY.
Enable runtime metrics and system-level monitoring.
Build dashboards and set alerts for real-time insights.

Datadog simplifies monitoring, so you can focus on scaling your Heroku app without worrying about performance issues.

Setup Datadog APM in One Minute

Datadog

Installing Datadog on Heroku

Heroku

Here's how to set up Datadog monitoring for your Heroku app.

Add-On Installation Steps

Before starting, make sure you have an existing Heroku application.

Use the Heroku CLI to add the Datadog buildpack.
Configure ports 8125 (for metrics) and 8126 (for traces).
Confirm the installation by checking the directory at /app/.apt/opt/datadog-agent.

If you're working with Docker-based applications, you'll need to adjust your Dockerfile. Below is an example for Debian-based Docker images:

# Add Datadog repository
RUN apt-get update && apt-get install -y gpg apt-transport-https
RUN sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable 7' > /etc/apt/sources.list.d/datadog.list"

# Install Datadog Agent
RUN DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=$DD_API_KEY DD_SITE=$DD_SITE bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"

After setting up the buildpack or Docker configuration, you’ll need to define environment variables to enable Datadog’s features.

Setting Up Environment Variables

Here’s a table of the key environment variables you need to configure:

Variable	Purpose	Example Value
DD_API_KEY	Authenticates with Datadog	Your API key
DD_SITE	Specifies the Datadog region	`datadoghq.com`
DD_DYNO_HOST	Uses the dyno name as hostname	`true`
DD_TAGS	Adds custom tags to your data	`env:production,team:backend`

Set these variables using the Heroku CLI. For example:

heroku config:set DD_API_KEY=your_api_key
heroku config:set DD_SITE=datadoghq.com
heroku config:set DD_DYNO_HOST=true
heroku config:set HEROKU_APP_NAME=your_app_name

Testing Your Setup

After installation, you can verify if everything is working correctly by using the Datadog Agent's built-in status checks.

Access your dyno:
```
heroku ps:exec -a your_app_name
```
Run the following command to check the Agent's status:
```
agent-wrapper status
```

The status output should confirm:

A valid API key is in use.
Collectors for any enabled integrations are active.
The APM Agent is running (if you’ve configured it).

If you notice missing data, review the Agent’s status output carefully. Double-check the following:

Configuration files located at /app/.apt/etc/datadog-agent
Log files stored at /app/.apt/var/log/datadog

Essential Heroku Metrics to Track

Keep an eye on key Heroku metrics using Datadog to ensure your app runs smoothly and efficiently.

Dyno Performance Metrics

Tracking dyno performance is crucial for understanding how your app responds under different conditions. Focus on these metrics:

Metric Type	Key Indicators
Response Time	Median (P50), 95th Percentile (P95), 99th Percentile
Memory Usage	Resident Set Size (RSS), Swap, Total memory usage
CPU Usage	Average and maximum usage (for Cedar-generation apps)
Throughput	Request volume by HTTP status codes

If your app consistently experiences high load, it might be time to consider scaling. But don't stop here - system-wide metrics can give you a more comprehensive view.

Heroku System Metrics

To get a full picture of your app's health, collect system-level metrics. Here's how to set it up:

Enable Runtime Metrics Use the following command to enable runtime metrics for your app:
```
heroku labs:enable log-runtime-metrics -a your_app_name
```
Configure Metric Collection Configure Datadog to gather system metrics by setting this parameter:
```
heroku config:set DD_DISABLE_HOST_METRICS=false
```
Monitor Platform Indicators Pay attention to critical platform-level indicators like:
- Database connection count
- Queue depth and processing time
- Add-on resource usage
- Router response times

These metrics provide insights into your app's backend performance and help you address bottlenecks before they escalate.

Log Analysis

Performance metrics tell part of the story, but logs can fill in the gaps. Here's how to set up detailed log analysis:

Set the Correct Source Parameter Specify your application's type and service name in your configuration:
```
source: ruby  # For Rails applications
service: your-app-name
```
Create Custom Facets Define custom facets for attributes that matter most to your app, such as:
- Controller actions
- Response times
- Error types
- User IDs

Datadog's built-in processing pipelines simplify log analysis by automatically parsing common patterns and extracting useful information. Keep an eye on these log patterns:

HTTP status codes, especially 5XX errors
Memory warnings like R14 errors
Database timeout errors
Router issues, including H10–H99 errors

Setting Up Monitoring in Datadog

Learn how to configure monitoring with custom dashboards and alert rules to keep an eye on your system's performance.

Creating Heroku Dashboards

Using the Heroku metrics you've collected, you can build custom dashboards to get real-time insights into your system's health.

System Performance Widgets

Start by setting the following environment variables to enable monitoring for Heroku Redis and Postgres:

DD_ENABLE_HEROKU_REDIS: true
DD_ENABLE_HEROKU_POSTGRES: true

Application Metrics Panel

Add these widgets to your dashboard for a detailed view:

Widget Type	Metrics to Display	Update Interval
Time Series	CPU Usage, Memory Usage	15 seconds
Top List	Highest Response Times	30 seconds
Query Value	Active Database Connections	1 minute
Heat Map	Request Distribution	1 minute

Resource Monitoring Section

To monitor your database resources, enable Database Monitoring for Postgres by setting the following variable:

DD_ENABLE_DBM=true

Once your dashboards are in place, the next step is to define alert rules that help you address potential issues before they escalate.

Setting Alert Rules

With your dashboards ready, configure alert rules to detect and respond to performance issues early.

Alert Category	Example Threshold	Notification Priority
CPU Usage	>80% for 10 minutes	High
Error Rate	>1% over 15 minutes	High
Response Time	Based on your baseline	Medium

Performance Degradation Alerts

Set up a metric monitor to track CPU usage:

monitor_type: metric
query: avg(last_10m):avg:heroku.dyno.cpu_usage{*} > 80

Composite Monitors for System Health

Combine multiple metrics, such as high CPU usage and increased error rates, into a composite monitor for a more comprehensive view of system health.

Log-Based Alerts

To capture critical errors, configure log levels and queries:

DD_LOG_LEVEL: WARN
query: "status:error service:your-app-name"

"Alerting is at the heart of proactive system monitoring, especially when managing dynamic environments that involve complex infrastructure." - Siddarth Jain

"Send a page only when symptoms of urgent problems in your system's work are detected, or if a critical and finite resource limit is about to be reached." - Alexis Lê-Quôc

Managing Costs and Performance

Controlling Monitoring Costs

Effective observability not only helps identify issues but also keeps your monitoring expenses under control. To make the most of your Heroku monitoring, it's essential to understand and optimize Datadog's pricing structure.

Here’s how you can manage costs without compromising on monitoring quality:

Monitoring Component	Optimization Strategy	Cost Impact
Log Management	Pre-sample logs at 50%	Saves $0.05/GB
Infrastructure	Consolidate workloads	Saves $15–$23 per host/month

For log management, you can use targeted exclusion filters to reduce unnecessary data processing. Here’s an example:

DD_LOGS_ENABLED: true
DD_LOGS_CONFIG_PROCESSING_RULES:
  - type: exclude_at_match
    name: exclude_dev_logs
    pattern: development

To further streamline costs:

Set log retention periods: Retain logs only as long as they’re necessary for compliance or analysis.
Enable metric filtering: Focus on critical business metrics and avoid collecting redundant data.
Automate cleanup: Regularly remove unused custom metrics to save resources.

By focusing on cost-efficiency, you can ensure your monitoring remains scalable and effective as your needs grow.

Monitoring for Growth

As your Heroku infrastructure evolves, your monitoring strategy should adapt to maintain visibility without a proportional rise in costs. A thoughtful approach to scaling ensures you stay efficient while supporting growth.

Here’s a quick breakdown of pricing considerations for key services:

Service Type	Base Price	Growth Strategy
APM	$31/host/month	Use sampling for high-traffic services
RUM	$1.50/1,000 sessions	Apply session-based filtering
Synthetic Tests	$5/10,000 API tests	Schedule tests during peak hours only

Dynamic sampling can be instrumental in managing costs for growing applications. For example:

DD_APM_SAMPLE_RATE: 0.5
DD_MAX_CUSTOM_METRICS: 100
DD_TRACE_SAMPLE_RATE: "rate=0.1;service=high-volume-api"

Additional growth-focused strategies include:

Consumption-based pricing: Adjust monitoring intensity based on workload fluctuations.
Automated shutdowns: Disable non-production monitoring during off-hours to save resources.
Metric collection limits: Restrict data collection for non-critical namespaces.

To optimize resources, consider reducing the frequency of synthetic tests during low-traffic periods while ensuring critical systems remain monitored.

"Alerting is at the heart of proactive system monitoring, especially when managing dynamic environments that involve complex infrastructure." - Siddarth Jain

"Send a page only when symptoms of urgent problems in your system's work are detected, or if a critical and finite resource limit is about to be reached." - Alexis Lê-Quôc

Summary

This section brings together the key aspects of setting up, tracking metrics, and managing costs for full-stack observability on Heroku using the Datadog add-on. Achieving effective monitoring requires careful configuration and consistent upkeep to ensure everything runs smoothly without overspending.

Here’s a quick recap of the essential configuration settings that support robust monitoring:

Component	Critical Setting	Impact on Observability
System Integration	Service Discovery	Unified metric collection
Performance Analysis	Trace Sampling	Balanced depth of insights
Resource Optimization	Targeted Filtering	Streamlined data management

By fine-tuning these configurations, you not only gain valuable insights but also keep costs under control. One standout practice is using unified service tagging through a prerun script. This step ensures seamless correlation between host metrics and APM data - especially useful for organizations juggling multiple dynos.

Key Configurations to Remember:

Buildpack Order: Place the Datadog buildpack before any language-specific buildpacks.
Environment Variables: Set critical variables like API keys and dyno host settings.
Resource Management: Keep an eye on agent memory usage and make adjustments as needed.

"Datadog has become the industry standard for observability. Engineers rely on it to monitor their infrastructure, troubleshoot issues, and build better systems - but while they're focused on shipping the next feature or solving a production issue, Datadog costs are quietly stacking in the background."
– Shouri Thallam, nOps

These insights tie back to earlier steps, highlighting the balance between effective monitoring and managing system performance alongside costs.

FAQs

How can I optimize my Heroku app for efficient and cost-effective monitoring with Datadog?

To make monitoring your Heroku app with Datadog both efficient and budget-friendly, there are a few practical strategies to keep in mind:

Filter unnecessary logs: Cut down on log ingestion by sending only the logs that truly matter to Datadog. This reduces clutter and saves on costs.
Streamline custom metrics: Combine data points where possible and avoid creating metrics with excessive granularity or high-cardinality values.
Leverage consumption-based pricing: Datadog’s pricing model works well for apps with temporary or fluctuating workloads, so align your usage to take full advantage.

You might also want to explore committed use discounts and group metrics thoughtfully to strike the right balance between cost management and maintaining clear insights into your app’s performance. These adjustments can help you keep monitoring effective without straining your budget.

What important metrics and alerts should I focus on when using Datadog for full-stack observability on Heroku?

When using Datadog to implement full-stack observability on Heroku, it's crucial to zero in on metrics that directly affect your app's performance and reliability. Start by tracking response times, error rates, and throughput - these give you a clear picture of your application's health. On top of that, monitor Heroku-specific metrics like dyno load, memory usage, and queue wait times to address platform-specific concerns.

Set up alerts for critical events like service crashes, high error rates, or unexpected traffic surges. Make sure these alerts are precise and actionable to prevent overwhelming your team with unnecessary notifications. Including context in the alerts can also help your team react faster. For logs, keep an eye on patterns such as spikes in errors or signs of security risks to catch problems before they escalate. By honing in on these metrics and setting up well-thought-out alerts, you can keep your Heroku-hosted applications running smoothly and minimize disruptions.

What should I do if my Datadog setup on Heroku isn't capturing all the metrics I need or shows missing data?

If your Datadog setup on Heroku isn't tracking all the metrics you need, the first step is to confirm that the Datadog Agent is running as expected within your app. Look for any errors in the Agent's status, and double-check that your Datadog API key, application name, and site are correctly set as environment variables.

Next, make sure the Datadog buildpack is properly added to your Heroku app. It should be listed after any buildpacks that install apt packages or make changes to the /app directory. If you're using services like Postgres or Redis, review their configurations and credentials to ensure everything is set up correctly. Anytime you adjust buildpacks or environment variables, remember to recompile your slug so the updates take effect.

Still running into problems? Compare your setup with the official Datadog documentation. This can help you spot any configuration issues and ensure your app is fully monitored for better performance.