Postgres troubles resolved

I am glad to report that our problems are fixed and that our server is back to humming along nicely. The following is posted here so that if some other souls find themselves in our situation that they may learn form our experience:

What we changed:

It was pointed out that max_connections of 500 was in fact insanely high, especially in light of using PGbouncer. Before we used PGbouncer we needed a lot more connections and when we started using PGbouncer, we never reduced this number.
Our server_lifetime was set far too high (1 hour). Josh Berkus suggested lowering that to 5 minutes.
We reduced the number of PGbouncer active connections to the DB.

What we learned:

We had too many backends
The backends were being kept around for too long by PGbouncer.
This caused too many idle backends to kick around. Once we exhausted physical ram, we started swapping.
Linux 3.2 apparently has some less than desirable swap behaviours. Once we started swapping, everything went nuts.

Going forward we’re going to upgrade our kernel the next time we have down time for our site and the rest should be sorted now.

Finally a word about Postgres itself:

Postgres rocks our world. I’m immensely pleased that once again the problems were our own stupidity and not Postgres’ fault. In over 10 years of using Postgres, problems with our site have never been Postgres’ fault. Not once.

Thanks to everyone who helped us through this tough time!

10 thoughts on “Postgres troubles resolved”

Pingback: Postgres troubles | MusicBrainz Blog
Bup says:

March 16, 2015 at 16:42

Congrats!
InvisibleMan78 says:

March 16, 2015 at 16:46

Just for my records:
Who is Josh Berkus?
ruaok says:

March 16, 2015 at 16:48

He is one of the driving forces behind Postgres and a fellow GSoC mentor.
David Bronson says:

March 16, 2015 at 19:37

I am on the PG list used to help diagnose the issue. In addition to a great Postgres product, there is a wonderful safety net in place of selfless contributors with their time and knowledge. It’s comforting to see the commitment this community has to the Open Source principles and helping other to achieve greatness. PG does indeed rock! Kudos as well to the MusicBrainz team for transparency and setting an example of how to be “helpable”..
chtfn says:

March 18, 2015 at 22:42

Glad to read this, and very happy to see that you can relax now.
Thanks for your hard work! 🙂
Pingback: Postgres troubles | goFLOSS!
Rajesh Dhawan says:

April 6, 2015 at 19:51

Thanks for this post. It’s very insightful. There isn’t a lot of high-scalability info out there for pgbouncer. So your findings are very useful.

> Before we used PGbouncer we needed a lot more connections and when we started using PGbouncer, we never reduced this number.

What did you reduce that number to?

> We reduced the number of PGbouncer active connections to the DB.

What does your number of pgbouncer active connections look like now?

Thanks!
ruaok says:

April 7, 2015 at 15:19

We reduced max_connections to 100, where we had it before we used pgbouncer. I do believe recommended is 2 * number of cores, which for us ought to be 48. But 100 seems to be fine for us now.

We reduced the number of active connections, by reducing adding these timeout statements to our pgbouncer.ini:

server_lifetime=300
server_idle_timeout=300

I hope this helps.
Rajesh says:

April 7, 2015 at 16:05

Very helpful!

Thanks.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

	Jaded Lover on SSL.com is evil and deceptive:…
	aerozol on SSL.com is evil and deceptive:…
	reosarevok on GSOC ‘23: Automating Area Mana…
	aerozol on Welcoming Hazel Savage to our…
	Suryansh Shakya on The Top 8 Mistakes of GSoC…

Related

10 thoughts on “Postgres troubles resolved”

Leave a comment