IBM Cloud Docs
Troubleshooting performance

Troubleshooting performance

Use the following guidance to troubleshoot problems with IBM Cloud® Databases for PostgreSQL performance.

For a general overview, see PostgreSQL performance.

Slow database performance

The typical symptoms of this problem are as follows:

  • High disk IO (input/output) or memory usage
  • Slow queries

The most common cause is slow queries being examined. Complete the following steps to determine whether this is the case:

  1. List the long running queries:

    select age(now(),query_start), pid, client_addr, state, substring(query,0,80) from pg_stat_activity where state != 'idle' order by 1 desc;
    
  2. Validate query blocking:

    select pg_blocking_pids(pid), pid AS blocked_pid, state, query, age(now(),query_start) AS duration from pg_stat_activity where array_length(pg_blocking_pids(pid),1) > 0 order by 1,2;
    
  3. Check the query execution plan:

    Explain <query>;
    
  4. Check whether database statisitcs are missing. Use the following query to validate: Action > Run Analyze Alternatively, run the following command:

    Analyze <tablename>  select relname, n_live_tup, n_dead_tup, last_vacuum,last_autovacuum,last_analyze,last_autoanalyze, analyze_count,autoanalyze_count from pg_stat_all_tables where schemaname='public';
    
  5. Check whether the index is missing from the table. PostgreSQL provides index methods such as B-Tree, hash, GiST, SP-GiST, GIN, and BRIN. Use one of these methods to create the index.

  6. Check whether the table is bloated. Run the following query to validate. Action > Run Vaccum

    Alternatively, use the following statement:

    SELECT
    relname AS table_name,
    pg_size_pretty(pg_relation_size(c.oid)) AS actual_size,
    pg_size_pretty(
    CASE
    WHEN c.relpages = 0 THEN 0
    ELSE (pg_relation_size(c.oid) - (c.relpages *
    (current_setting('block_size')::numeric - 24))) * 100 / pg_relation_size(c.oid)
    END
    ) AS bloat_percentage, pg_size_pretty(
    CASE
    WHEN c.relpages = 0 THEN 0
    ELSE (pg_relation_size(c.oid) - (c.relpages *
    (current_setting('block_size')::numeric - 24)))
    END
    )AS bloat_size
    FROM pg_class c
    LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
    WHERE relkind = 'r' AND n.nspname = 'public' -- Adjust schema if needed
    ORDER BY bloat_size DESC NULLS LAST;
    
  7. Consider whether the database or objects have been created with pg_dump or pg_restore. Use Action > update stats to refresh.

  8. Ensure that sufficient work_mem is allocated, if the query is completing a sorting operation.

  9. Use pg_stat statements to find out the following information:

    • query average execution time and number of calls

    • CPU percentage usage by the queries

    • memory percentage usage by the queries

pg_stat_activity, pg_stat_bgwriter, and pg_buffercache are important tools in PostgreSQL for monitoring and understanding database performance.

Deployment monitoring and database load monitoring

Databases for PostgreSQL deployments offer an integration with the IBM Cloud Monitoring service for monitoring of resource usage on your deployment. Use this service to monitor your deployment (disk and memory) and your database load.

Use Cloud Databases dashboards to set alerts on CPU, memory, and disk IOPS thresholds. Many of the available metrics, like disk usage and IOPS, are useful to help you configure autoscaling on your deployment. Autoscaling is not enabled by default therefore you must configure it manually.

Observing trends in your usage and configuring the autoscaling to respond to these trends can help alleviate performance problems before your databases become unstable because of resource exhaustion. For example, changes in disk IO or CPU or memory faults resulting in performance issues.

Disk IOPS

The number of input/output operations per second (IOPS) is limited by the type and size of storage volume. Storage volumes for Databases for PostgreSQL deployments are provisioned on Block Storage Endurance Volumes in the 10 IOPS per GB tier.

If your operational load saturates or exceeds the IOPS limit, database requests and operations are delayed until the storage subsystem can catch up. Extended periods of heavy load can cause your deployment to be unable to process queries and become effectively unavailable. You can increase the number IOPS available to your deployment by increasing disk size. For example, for 10 IOPS per GB tier, you can increase IOPS by increasing volume size.

Extended periods of even 40-50% disk utilization can have a significant negative impact on database performance. Allocate at least 100 GB disk (1,000 IOPS) for production environments. IOPS = 10 × allocated GB (for example, 100 GB = 1,000 IOPS)

Memory usage

Memory is the fastest and most efficient way to access and process data. Because of this, a database almost always performs faster using memory instead of reading data from disk. There are other metrics to look at and consider like cache hits, blocks read, and blocks hit. Just seeing 100% memory usage is not by itself a cause for concern. If memory is exhausted and swap pages are used, performance can degrade significantly.

You can set the amount of memory that is dedicated to the database's shared buffer pool by adjusting shared_buffers in your Databases for PostgreSQL configuration. The maximum recommended value is 25% of the deployment's total memory. Allocating too much memory to the shared buffer pool can starve the system of memory for other purposes, can hinder performance, or possibly even disable the database.

Database load monitoring

You have two options to check database load:

Option 1. Check for long running queries by using IBM Cloud Logs (ICL):

For more information, see How do I track query history?

You can use the following sample DataPrime query in IBM Cloud Logs.

  1. After IBM Cloud Logs loads, switch to the </>DataPrime tab. Do not change anything in the </>Lucene search bar.
  2. From the </>DataPrime tab, run the following search to find SQL running for more than 1000 milliseconds. You can also paste it into the search for the </>DataPrime tab.
{

source logs|filter message.attr.durationMillis>=1000

}

Option 2. Enable the log_min_duration_statement

Using the log_min_duration_statement specifies that statements that take longer than the specified number of milliseconds are logged. For more information, see log_min_duration_statement.

You can also install the pg_stat_statements extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

This extension allows you to query the statistics to find examples of queries that are particularly long-running. You can then isolate issues and consider a different query pattern, new or modified indexes, different table design, or other strategies to improve performance.

Query 1: Identify time-consuming queries

You can spot time-consuming queries by running the following statement:

SELECT username, database, queryid, query_preview, calls, total_exec_time, pct_exec_time, ROUND(SUM(pct_exec_time) OVER (ORDER BY total_exec_time DESC)) AS cum_pct_exec_time, avg_exec_time FROM ( SELECT pu.usename AS username, pd.datname AS database, pss.queryid, LEFT(pss.query, 50) AS query_preview, pss.calls, ROUND(pss.total_exec_time::numeric, 3) AS total_exec_time, ROUND((100.0 * pss.total_exec_time / SUM(pss.total_exec_time) OVER ())::numeric, 2) AS pct_exec_time, ROUND(pss.mean_exec_time::numeric, 3) AS avg_exec_time FROM pg_stat_statements pss JOIN pg_user pu ON pss.userid = pu.usesysid JOIN pg_database pd ON pss.dbid = pd.oid ) AS subquery ORDER BY total_exec_time DESC LIMIT 25;

This statement produces the following result:

Result from time-consuming query statement
username database queryid query_preview calls total_exec_time pct_exec_time cum_pct_exec_time avg_exec_time
ibm postgres <insufficient privilege> 50685 38286.580 19.52 20 0.755
ibm postgres <insufficient privilege> 280111 28477.951 14.52 34 0.102
ibm postgres <insufficient privilege> 18 14568.978 7.43 41 809.388
ibm postgres <insufficient privilege> 18 12103.904 6.17 48 672.439
ibm ibmclouddb <insufficient privilege> 37552 7799.984 3.98 52 0.208
(5 rows)

The query tracks the execution statistics of SQL statements and provides the following information:

  • Shows the 25 queries that took the most time in total to run
  • Displays who ran the query, on which database, and a preview of the query
  • Shows how often each query was run and how long it took on average
  • Calculates the percentage of total database time each query used
  • Provides a running total of database time used

Query 2: Identify frequently run queries

You can spot frequently run queries by running the following statement:

SELECT
   username,
   database,
   queryid,
   query_preview,
   calls,
   pct_calls,
   ROUND(SUM(pct_calls) OVER (ORDER BY calls DESC)) AS cum_pct_calls,
   total_exec_time,
   avg_exec_time
FROM (
   SELECT
      pu.usename AS username,
      pd.datname AS database,
      pss.queryid,
      LEFT(pss.query, 50) AS query_preview,
      pss.calls,
      ROUND(100.0 * pss.calls / SUM(pss.calls) OVER (), 2) AS pct_calls,
      ROUND(pss.total_exec_time::numeric, 3) AS total_exec_time,
      ROUND(pss.mean_exec_time::numeric, 3) AS avg_exec_time
   FROM
      pg_stat_statements pss
      JOIN pg_user pu ON pss.userid = pu.usesysid
      JOIN pg_database pd ON pss.dbid = pd.oid
) AS subquery
ORDER BY
   calls DESC
LIMIT 25;

This statement produces the following result:

Result from frequently run query statement
username database queryid query_preview calls pct_calls cum_pct_calls total_exec_time avg_exec_time
ibm postgres <insufficient privilege> 80285 23.84 24 16095.420 0.200
ibm postgres <insufficient privilege> 36832 10.94 35 548.787 0.015
ibm postgres <insufficient privilege> 20626 6.12 41 741.755 0.036
ibm postgres <insufficient privilege> 14702 4.36 45 23982.252 1.631
ibm postgres <insufficient privilege> 12436 3.69 49 1750.426 0.141

The query provides the following information:

  • Shows the 25 queries that were run most often
  • Displays who ran the query, on which database, and a preview of the query
  • Shows how many times each query was run and how long it took in total and on average
  • Calculates the percentage of all query calls each query represents
  • Provides a running total of query calls

Get current performance of all queries

  1. To obtain the current performance across all queries, run the following statement:

    select now() as t1,sum(total_exec_time) as et1, sum(calls) as c1 from pg_stat_statements
    

    This produces the following result:

    Result from performance of all current queries statement
    t1 et1 c1
    2025-09-30 08:15:43.898157+00 104109.55454499979 336889
    (1 row)
  2. Then wait 10 seconds and run the following statement:

    select now() as t2,sum(total_exec_time) as et2, sum(calls) as c2 from pg_stat_statements
    

    This produces the following result:

    Result from select now() statement
    t2 et2 c2
    2025-09-30 08:16:18.943609+00 104113.70054399982 336896
    (1 row)
  3. Calculate the queries per second:

    Queries per second = (c2-c1)/(t2-t1) and average query performance = (et2-et1)/(c2-c1)
    

Top 10 time-consuming queries

To obtain the top 10 time-consuming queries:

SELECT query, calls, total_exec_time/calls as avg_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

This produces the following result:

Result from top 10 time-consuming queries statement
query calls avg_time
<insufficient privilege> 14706 1.6311896077791408
<insufficient privilege> 80305 0.2004813740987463
<insufficient privilege> 6 800.7297508333332
<insufficient privilege> 6 721.1415835000001
<insufficient privilege> 10316 0.3907427791779766
<insufficient privilege> 10316 0.35429842613416024
<insufficient privilege> 10316 0.3184477990500202
<insufficient privilege> 10316 0.2285179489143081
<insufficient privilege> 12439 0.1407493980223488
<insufficient privilege> 10316 0.1605010507948812
(10 rows)

Further actions you can take to improve performance

You can also consider the following actions to troubleshoot performance:

  • Optimize slow queries

    • Run EXPLAIN (ANALYZE, BUFFERS) on slow queries. This shows how the database runs the query and where it is slow
    EXPLAIN (ANALYZE, BUFFERS)
    SELECT * FROM student WHERE student_id = 12345;
    
    • Look for:

      • Queries that use a lot of memory or disk space

      • Add IBM Cloud resources as needed

      • Scale the disk/memory for higher IOPS and memory

  • Missing or inefficient indexes are a common cause of slow queries. Use EXPLAIN to identify sequential scans and consider adding indexes.

  • Run VACUUM to help with database health analysis.

  • Consider using connection pooling to handle more connections. Databases for PostgreSQL sets the maximum number of connections to your Databases for PostgreSQL database to 115. 15 connections are reserved for the superuser to maintain the state and integrity of your database, and 100 connections are available for you and your applications. After the connection limit is reached, any attempts at starting a new connection result in an error. To prevent overwhelming your deployment with connections, use connection pooling, or scale your deployment and increase its connection limit. For more information, see Managing PostgreSQL connection pooling.

  • Check for any recipes running backups and batch upload of data for example : automatic backups are completed daily and kept with a simple retention schedule of 30 days. If a backup is stuck, you can check the Available backups section and identify the stuck backup in the Cloud UI page of the database instance.

  • Check your IBM Cloud notifications for any maintenance. For example, database patching.

  • If you believe this is a platform issue such as maintenance, contact IBM Support with the database CRN.