Bloated index/tables in postgres DB due to lack of a VACUUM or a badly structured database. An index bloat can slow down your database and consume more storage space than required and therefore it can be important to find out if your database has an index bloat. The VACUUM command marks every row in a page as unavailable and if the page happens to be the final page for the entire object, then some disk space can be converted to the filesystem (/). If it happens that there exists a single available row and the page is not the final page for the entire object, then the disk space is never converted to the file system by a normal VACUUM command. This is what is commonly referred to a bloat.
IF you issue a VACUUM with a special flag FULL command, it completely rewrites the entire indexes to new pages. Every time a query is issued against a table, the individual flags on rows and index entries are checked to find out if it is actually available for that transaction. On large tables it can build up a lot of time. It can actually make no difference especially when query performance of your database is expected to improve but it does not. This why index bloats have to be checked independently from table bloats.
It is a good idea to run VACUUM commands with ANALYZE at the same time either as a single command or two separate commands. This updates the internal statistics that postgres uses while it creates its query plans. Postgres determines how to run its plan based on the number of existing live or dead
Open your postgresql.conf file and uncomment these lines;
* enable autovacuum and log every run > 1 sec. autovacuum = on log_autovacuum_min_duration = 1000
If you need to analyze your indexes more often, then reduce the fraction 0.2 to 0.02.
autovacuum_vacuum_scale_factor = 0.2 autovacuum_analyze_scale_factor = 0.1
prevent autovacuum from running too often
A “normal” vacuum wouldn’t help in this case because it doesn’t release the pages to the os, they will only be marked as free in the FSM (Free Space Map).
One solution would be to run a
vacuum full verbose analyze table_name
You should also keep in mind to “defragment” bloated indexes with
REINDEX INDEX indexname # recreate index explicitly REINDEX TABLE tablename # recreate all indexes for this table
Tools you can use
One thing to keep in mind is that the tool in reference works by doing a lot of updates in the table. You have to put some caution while using this tool especially “ON UPDATE” triggers. Your triggers shouldn’t be fixed, not to do anything if the update doesn’t change any data in row.
Get the tool using git clone
$ git clone https://github.com/grayhemp/pgtoolkit.git Cloning into 'pgtoolkit'... remote: Counting objects: 1925, done. remote: Compressing objects: 100% (277/277), done. remote: Total 1925 (delta 1220), reused 1915 (delta 1210) Receiving objects: 100% (1925/1925), 306.22 KiB | 327 KiB/s, done. Resolving deltas: 100% (1220/1220), done.
we actually create this to see how the script works
$ create extension pgstattuple; CREATE EXTENSION
- Let’s skip the details, and move forward to the usage:
$ time pgtoolkit/bin/pgcompact -v info -d depesz -t test -p 5930 2>&1 | tee pgcompac15 15:59:43 2013 INFO Database connection method: psql. Tue Oct 15 15:59:43 2013 depesz INFO Created environment. Tue Oct 15 15:59:43 2013 depesz INFO Statictics calculation method: pgstattuple. Tue Oct 15 15:59:43 2013 depesz, public.test INFO Vacuum initial: 169614 pages left, duration 0.359 seconds. Tue Oct 15 15:59:44 2013 depesz, public.test INFO Bloat statistics with pgstattuple: duration 0.384 seconds. Tue Oct 15 15:59:44 2013 depesz, public.test NOTICE Statistics: 169614 pages (218154 pages including toasts and indexes), approximately 94.64% (160521 pages) can be compacted reducing the size by 1254 MB. Tue Oct 15 15:59:44 2013 depesz, public.test INFO Update by column: id. Tue Oct 15 15:59:44 2013 depesz, public.test INFO Set pages/round: 10. Tue Oct 15 15:59:44 2013 depesz, public.test INFO Set pages/vacuum: 3393. Tue Oct 15 15:59:59 2013 depesz, public.test INFO Cleaning in average: 806.4 pages/second (0.012 seconds per 10 pages). Tue Oct 15 16:00:00 2013 depesz, public.test INFO Vacuum routine: 166209 pages left, duration 0.680 seconds. ... Tue Oct 15 16:12:16 2013 depesz, public.test INFO Set pages/vacuum: 556. Tue Oct 15 16:12:16 2013 depesz, public.test INFO Set pages/round: 9. Tue Oct 15 16:12:17 2013 depesz, public.test INFO Vacuum final: can not clean 171 pages, 8893 pages left, duration 0.185 seconds. Tue Oct 15 16:12:43 2013 depesz, public.test INFO Analyze final: duration 25.963 second. Tue Oct 15 16:12:43 2013 depesz, public.test INFO Bloat statistics with pgstattuple: duration 0.029 seconds. Tue Oct 15 16:12:43 2013 depesz, public.test NOTICE Processing complete. Tue Oct 15 16:12:43 2013 depesz, public.test NOTICE Processing results: 8893 pages left (57392 pages including toasts and indexes), size reduced by 1256 MB (1256 MB including toasts and indexes) in total. Tue Oct 15 16:12:43 2013 depesz NOTICE Processing complete. Tue Oct 15 16:12:43 2013 depesz NOTICE Processing results: size reduced by 1256 MB (1256 MB including toasts and indexes) in total. Tue Oct 15 16:12:43 2013 NOTICE Processing complete: 0 retries from 10. Tue Oct 15 16:12:43 2013 NOTICE Processing results: size reduced by 1256 MB (1256 MB including toasts and indexes) in total, 1256 MB (1256 MB) depesz. Tue Oct 15 16:12:43 2013 depesz INFO Dropped environment. real 13m0.881s user 0m11.860s sys 0m2.188s
$ select relname, relkind, pg_size_pretty(pg_table_size(oid)) as size from pg_class where relname ~ '^(i[0-9]|test|test_pkey)$' order by pg_relation_size(oid) desc; relname | relkind | size -----------+---------+-------- i6 | i | 242 MB test | r | 70 MB i3 | i | 30 MB i1 | i | 21 MB i2 | i | 21 MB i4 | i | 21 MB i5 | i | 21 MB test_pkey | i | 21 MB (8 rows)
All in all – it works pretty good. The 800 pages per second means 6.5MB/s. Maybe not the fastest thing on earth, but faster than expected.