237 Commits

Author SHA1 Message Date
Alexander Kukushkin
453e68637a Don't try to remove leader key when running ctl on the leader node (#302) 2016-09-19 13:33:24 +02:00
Alexander Kukushkin
5c8399e4fa Make sure data directory is empty before trying to restore backup (#307)
We are doing number of attempts when trying to initialize replica using
different methods. Any of this attemp may create and put something into
data directory, what causes next attempts fail.
In addition to that improve logging when creating replica.
2016-09-19 13:32:27 +02:00
Alexander Kukushkin
c2b91d0195 Merge branch 'master' of github.com:zalando/patroni into feature/disable-automatic-failover 2016-09-05 16:03:55 +02:00
Oleksii Kliukin
3f7fa4b41f Avoid retries when syncing replication slots. (#282)
* Avoid retries when syncing replication slots.

Do not retry postgres queries that fetch, create and drop slots at the end of
the HA cycle. The complete run_cycle routine executes with the async_executor
lock. This lock is also used with scheduling operations like reinit or restart
in different threads. Looks like CPython threading class has fairness issues
when multiple threads try to acquire the same lock and one of them executes
long-running actions while holding it: the others have little chances of
acquiring the lock in order. To get around this issue, the long action (i.e.
retrying the query) is removed.

Investigation by Ants Aasma and Alexander Kukushkin.
2016-09-02 17:00:37 +02:00
Alexander Kukushkin
db9b62b7ed Merge branch 'master' of github.com:zalando/patroni into feature/disable-automatic-failover 2016-09-01 11:09:09 +02:00
Alexander Kukushkin
33ff372ef6 Always try to rewind on manual failover 2016-09-01 11:08:26 +02:00
Oleksii Kliukin
46f1c5b690 Merge pull request #269 from zalando/feature/replica-info
Return replication information on the api
2016-08-31 13:58:19 +02:00
Ants Aasma
fa6bd51ad1 Appease Quantifiedcode about stylistic issues 2016-08-30 00:40:19 +03:00
Ants Aasma
e428c8d0fa Replace invalid characters in member names for replication slot names
PostgreSQL replication slot names only allow names consisting of [a-z0-9_].
Invalid characters cause replication slot creation and standby startup to fail.
This change substitutes the invalid characters with underscores or unicode
codepoints. In case multiple member names map to identical replication slots
master log will contain a corresponding error message.

Motivated by wanting to use hostnames as member names. Hostnames often
contain periods and dashes.
2016-08-30 00:21:33 +03:00
Alexander Kukushkin
74166e996c Fix tests and formatting 2016-08-25 10:09:32 +02:00
Feike Steenbergen
1fc8b43b36 Return replication information on the api
To enable better monitoring, it is useful to have replication statistics.
Addresses issue #261
2016-08-24 09:31:49 +02:00
Alexander Kukushkin
8ef7178ddf Refactor code dealing with database connection string/params (#255)
In the original code we were parsing/deparsing url-style connection
strings back and forth. That was not really resource greedy but rather
annoying. Also it was not really obvious how to switch all local
connections to unix-sockets (preferably).

This commit isolates different use-cases of working with connection
strings and minimizes amount of code parsing and deparsing them. Also it
introduces one new helper method in the `Member` object - `conn_kwargs`.
This method can accept as a parameter dict object with credentials
(username and password). As a result it returns dict object which could
be used by `psycopg2.connect` or for building connection urls for
pg_rewind, pg_basebackup or some other replica creation methods.

Params for local connection are builded in the `_local_connect_kwargs`
method and could be changed to unix-socket later easily.
2016-08-10 10:19:52 +02:00
Oleksii Kliukin
c91eda8d78 Merge branch 'master' into feature/scheduled_restarts 2016-07-11 12:56:24 +02:00
Alexander Kukushkin
659f7617f5 New option: remove_data_directory_on_rewind_failure
One more try to fix pg_rewind
2016-07-05 12:11:15 +02:00
Oleksii Kliukin
8834f929aa Improve the unit tests/coverage. 2016-07-05 10:07:29 +02:00
Alexander Kukushkin
b84e22c4ea Implement more checks in the follow method
Although such situation should not happen in reality (follow method is
not supposed to be called when when the node is holding leader lock and
postgres is running), but to be on the safe side it is better to
implement as much checks as possible, because this method could
potentially remove data directory.
2016-07-04 10:56:37 +02:00
Alexander Kukushkin
ee529669d2 Start readonly when holding leader lock
Not starting of postgres was causeing situation when there were no
master running...
2016-07-01 12:28:02 +02:00
Alexander Kukushkin
aa10f42913 checkpoint method returns string status message 2016-06-30 10:45:54 +02:00
Alexander Kukushkin
4b67008488 Try to cover as much as possible pg_rewind corner-cases
rewind is not possible when:
1) trying to rewind from themself
2) leader is not reachable
3) leader is_in_recovery

All these cases were leading to removing of data directory...
In all cases except 1) it should "retry" when leader will became
available and not is_in_recovery.
2016-06-29 14:29:31 +02:00
Alexander Kukushkin
0318749b56 bugfix: api must report role=master during pg_ctl stop
In addition for that make pg_ctl --timeout option configurable.
If the stop or start didn't succeeded during given timeout when demoting
master, role will be forcibly changed to 'unknown' and all needed
callbacks executed.
2016-06-28 14:14:42 +02:00
Alexander Kukushkin
bd1e658080 Bugfix: obviously sys.hexversion was one symbol shorter
plus remove some unneeded code
2016-06-17 12:18:41 +02:00
Alexander Kukushkin
57807ff337 Don't expose replication user/passwd in DCS 2016-06-15 09:34:04 +02:00
Alexander Kukushkin
c64170ef33 Extend list of postgres parameters controlled by Patroni
These parameters usually must be the same across all cluster nodes and
therefore must be set only via global configuration and always passed as
a list of postgres arguments (via pg_ctl) to make it not possible
accidentally change them by 'ALTER SYSTEM'
2016-06-13 10:33:14 +02:00
Alexander Kukushkin
57c6641683 Reimplement pg_ctl status in python
subprocess.call was causing problems when server is running under high
load.
2016-06-09 08:28:11 +02:00
Alexander Kukushkin
16771f37d5 Compare old and new user-defined-parameters to avoid reload
when parameters didn't changed.
Plus get wal_segment_size from pg_settings instead of hardcoding it's value.
2016-06-03 12:11:14 +02:00
Alexander Kukushkin
2e5ce4a303 "Smart" compare of postgres parameters
to decide do we need to reload/restart
2016-06-02 16:34:34 +02:00
Alexander Kukushkin
b3ada161cf Implement possibility to configure retry_timeout globally
Previously it was hardcoded all over the place.
2016-05-31 10:30:53 +02:00
Alexander Kukushkin
e085c866dc Reshuffle acceptance tests
Move dynamic config tests from basic_replication to patroni_api
2016-05-30 11:30:41 +02:00
Alexander Kukushkin
342eec5c2f Bugfix: pg_rewind can work only with master 2016-05-25 20:50:28 +02:00
Alexander Kukushkin
7827951c8c Dynamic configuration 2016-05-25 14:17:05 +02:00
Alexander Kukushkin
d422e16aad Implement reload of config.yaml on SIGHUP
If some changes require restart of postgres patroni will expose
`restart_pending` flag in DCS and via REST API
2016-05-13 13:31:21 +02:00
Alexander Kukushkin
45a52e21f0 Write postgres options to postgresql.conf
Originally we were passing postgresql options as an argument of `pg_ctl
start`. It was nice and convenient because doesn't require to touch
configuration files but this method has one significant drawback: it
wasn't possible to change values of options which were passed as an
arguments without restart (event for the case when option reqires only
reload). Instead of doing that (passing options as arguments) we will:
1) rename original postgresql.conf to postgresql-base.conf
2) write options into postgresql.conf which has `include
  'postgresql-base.conf'` on the the third line after comment that this
  file is generated by Patroni and you should not change it manually
3) listen_addresses and port are still passed as an arguments to the
  pg_ctl (just to be foolproof against ALTER SYSTEM set port to 'random')

In addition to that this commit makes some attributes of `Postgresql`
class private (prefixes them with _)
2016-05-13 12:40:04 +02:00
Alexander Kukushkin
36d187ee1f Remove data directory only if replica creation failed
And follow the right node after replica creation (it was following
the same node from which it took the backup)
2016-05-10 10:59:26 +02:00
Feike Steenbergen
fbf44d3219 Merge pull request #177 from zalando/feature/remove_pghba_magic
Remove pg_hba injection and filtering
2016-04-21 10:24:52 +02:00
Feike Steenbergen
28d5de17e1 Remove pg_hba injection and filtering
Previously we explicitly injected a replication record into pg_hba.conf.
This doesn't allow users to explicitly write their configurations.

This change will just write the lines specified by the user.
2016-04-20 11:06:36 +02:00
Oleksii Kliukin
309b5d4803 Remove restrictions on running pg_rewind.
Previously, pg_rewind was called only if a crashed master tried
to rejoin the cluster. It didn't cover the important case of a
master shut down cleanly, but with a combination of a smart
shutdown and subsequently a fast shutdown. Since out pg_rewind
code does not depend on the "uncleanness" of the master's shutdown,
we can call it unconditionally in all cases where the former master
tries to rejoin as a replica.

This resolves  #167.
2016-04-11 17:56:18 +02:00
Alexander Kukushkin
3a7d2c3874 Remove unused code from unit tests 2016-03-21 20:48:17 +01:00
Oleksii Kliukin
e802bba5f9 Merge pull request #151 from zalando/feature/base_backup_from_the_replica
First implementation of cloning from the replica.
2016-03-11 16:57:06 +01:00
Oleksii Kliukin
805716ed68 Variables and parameters renaming.
Previously, "without_leader" suffix was used in the name of methods
and functions that initialize a replica without an active replication
connection, and leader was part of the name for parameters and messages
that require an active replication conneciton. Since we support init
from the members other than the leader, those conventions have to be
changed.
2016-03-11 10:19:00 +01:00
Alexander Kukushkin
a38af0949b Merge branch 'master' of github.com:zalando/patroni into feature/xlog_lag_interval 2016-02-23 14:48:49 +01:00
Alexander Kukushkin
f7d60c61b6 remove unused code 2016-02-17 14:09:00 +01:00
Alexander Kukushkin
de129b733d Fix unit tests 2016-02-17 12:46:32 +01:00
Alexander Kukushkin
1bc22727d5 patroni/postgresql.py
directory could disappear after successfull call of isdir
2016-02-15 14:50:59 +01:00
Alexander Kukushkin
b973ed7e4f improve test coverage 2016-02-12 16:52:26 +01:00
Alexander Kukushkin
df9b8fed2e Improve quality of code by resolving issues found by quantifiedcode and codacy 2016-02-12 12:23:49 +01:00
Oleksii Kliukin
14b8dfa3e8 Make create_replica_method a YAML array.
Make sure the absense of this key or empty value in it is handled
correctly. Update tests and sample configuration files.
2015-11-25 10:29:17 +01:00
Oleksii Kliukin
35efd36c5c Improve unittests and make minor bugfixes.
In particular, remove restore.py in favor of
wale_restore.py, fix minor bugs in the latter
and add unit tests.
2015-11-24 15:21:47 +01:00
Oleksii Kliukin
e625c33bef Merge branch 'master' into pgexperts-restore/movebasebackup 2015-11-23 15:42:26 +01:00
Oleksii Kliukin
c003af294a Merge pull request #82 from zalando/feature/patroni_cli_or_ctl_tbd
Feature/patroni cli or ctl tbd
2015-11-18 16:17:02 +01:00
Feike Steenbergen
ca4d9eaaf9 Patronictl: Expand tests to increase coverage 2015-11-18 11:51:24 +01:00