Thursday, February 10, 2022

Shared Memory for PostgreSQL


Came across this post while looking for Postgres shared memory configuration. 
Posting the link for source reference

SourceAlexey Vasiliev

Setting up shared memory for PostgreSQL

Hello my dear friends. In this article I will cover about setting up shared memory on Linux for PostgreSQL.

What is shared memory?

Shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Using memory for communication inside a single program, for example among its multiple threads, is also referred to as shared memory.

PostgreSQL and shared memory

The “shared_buffers” configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. One reason the defaults are low is because on some platforms (like older Solaris versions and SGI), having large values requires invasive action like recompiling the kernel. Even on a modern Linux system, the stock kernel will likely not allow setting shared_buffers to over 32MB without adjusting kernel settings first.

If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. If you have less RAM you’ll have to account more carefully for how much RAM the OS is taking up; closer to 15% is more typical there. There are some workloads where even larger settings for shared_buffers are effective, but given the way PostgreSQL also relies on the operating system cache, it’s unlikely you’ll find using more than 40% of RAM to work better than a smaller amount.

It’s likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value for shared_buffers this high. On UNIX-like systems, if you set it above what’s supported, you’ll get a message like this

FATAL:  could not create shared memory segment: Invalid argument
DETAIL:  Failed system call was shmget(key=5440001, size=4011376640, 03600).

This error usually means that PostgreSQL's request for a shared memory
segment exceeded your kernel's SHMMAX parameter. You can either
reduce the request size or reconfigure the kernel with larger SHMMAX.
To reduce the request size (currently 4011376640 bytes), reduce
PostgreSQL's shared_buffers parameter (currently 50000) and/or
its max_connections parameter (currently 12).

probably means your kernel’s limit on the size of shared memory is smaller than the work area PostgreSQL is trying to create (4011376640 bytes in this example). Or it could mean that you do not have System-V-style shared memory support configured into your kernel at all. As a temporary workaround, you can try starting the server with a smaller-than-normal number of buffers (shared_buffers).

Configure shared memory

We must to setup our Unix system shared memory, if we want tune PostgreSQL. We can go by this link and see how to setup it. But what values to set for shmmax and shmall? To do this, I made a small script. It’s mission - to calculate and display the size of shared memory, which is half the available memory on the server:

#!/bin/bash
# simple shmsetup script
page_size=`getconf PAGE_SIZE`
phys_pages=`getconf _PHYS_PAGES`
shmall=`expr $phys_pages / 2`
shmmax=`expr $shmall \* $page_size`
echo kernel.shmmax = $shmmax
echo kernel.shmall = $shmall

For example, for a server with 2GB of RAM script will generate the following:

kernel.shmmax = 1055092736
kernel.shmall = 257591

Here SHMMAX maximum size (in bytes) on the segment of shared memory, is set to 1 GB. SHMALL - the total amount of shared memory (in pages), which all processes on the server can use. The number of bytes in a page depends on the operating system, basically, the default is 4096 bytes. In order to have these data have applied for Linux (Ubuntu, Debian), run this command from the root (./shmsetup - the script):

$ ./shmsetup >> /etc/sysctl.conf

And check that:

$ sysctl -p

Also do not forget about semaphores in the system:

$ ipcs -l
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

The values in sysctl:

$ sysctl kernel.sem = 250 32000 32 128

All four values may need to be increased on systems with a large number of processes.

PostgreSQL 9.3

In 9.3, PostgreSQL has switched from using SysV shared memory to using Posix shared memory and mmap for memory management. This allows easier installation and configuration of PostgreSQL, and means that except in usual cases, system parameters such as SHMMAX and SHMALL no longer need to be adjusted.

Summary

As can be seen, setting up shared memory on Linux for PostgreSQL is not so hard.  

Friday, January 21, 2022

Postgres parameters to check performance

Need to check/Set below parameters: 

show shared_preload_libraries;

show debug_print_plan;

show debug_print_parse;

show logging_collector;

show log_statement;

show log_min_duration_statement;

show log_temp_files;

show log_connections;

show log_disconnections;

show log_lock_waits;

show log_line_prefix;

show log_checkpoints;

show log_error_verbosity;


ALTER DATABASE DATABASE_NAME set shared_preload_libraries = pg_stat_statements;

ALTER DATABASE DATABASE_NAME set log_statement = 'all';

ALTER DATABASE DATABASE_NAME set log_min_duration_statement = 0;

ALTER DATABASE DATABASE_NAME set logging_collector = on;

ALTER DATABASE DATABASE_NAME set debug_print_plan = on;

ALTER DATABASE DATABASE_NAME set debug_print_parse = on;

ALTER DATABASE DATABASE_NAME set log_connections = on;

ALTER DATABASE DATABASE_NAME set log_disconnections = on;

ALTER DATABASE DATABASE_NAME set log_lock_waits = on;

ALTER DATABASE DATABASE_NAME set log_temp_files = 0;

ALTER DATABASE DATABASE_NAME set log_line_prefix = '%t:%r:%u@%d:[%p]: db=%d,user=%u,app=%a,client=%h';

ALTER DATABASE DATABASE_NAME set log_checkpoints = on;

ALTER DATABASE DATABASE_NAME set log_error_verbosity = default;



select tc.table_schema, tc.table_name, kc.column_name

from information_schema.table_constraints tc

  join information_schema.key_column_usage kc 

    on kc.table_name = tc.table_name and kc.table_schema = tc.table_schema and kc.constraint_name = tc.constraint_name

where tc.constraint_type = 'PRIMARY KEY'and kc.ordinal_position is not nullorder by tc.table_schema,

         tc.table_name,

         kc.position_in_unique_constraint;

Queries to check Postgres query performance

Some queries to check Postgres Performance


Show sessions older than 5 minutes

===================================

SELECT

pid,

now() - pg_stat_activity.query_start AS duration,

query,

state

FROM pg_stat_activity

WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';


Show mean time per query

========================

select queryid,query as short_query,

round(total_time::numeric,2) as total_time,

calls, round(mean_time::numeric,2) as mean,

round((100 * total_time / sum(total_time::numeric)

OVER ())::numeric, 2) as percentage_overall

from pg_stat_statements order by total_time desc

limit 20;


Sequential scans on tables

===========================

select schemaname, relname, seq_scan, seq_tup_read, idx_scan, seq_tup_read / seq_scan as AVG

from pg_stat_user_tables

where seq_scan > 0

order by seq_tup_read desc;


Fragmentation of tables

=======================

SELECT

relname AS ObjectName

,pg_stat_get_live_tuples(c.oid) AS LiveTuples

,pg_stat_get_dead_tuples(c.oid) AS DeadTuples

FROM pg_class c

where pg_stat_get_live_tuples(c.oid) > 0 or pg_stat_get_dead_tuples(c.oid) > 0

order by pg_stat_get_live_tuples(c.oid),pg_stat_get_dead_tuples(c.oid) desc;


Show status of tables

=====================

select relname,last_vacuum, last_autovacuum, last_analyze, last_autoanalyze from pg_stat_all_tables

where relname='TABLE_NAME';


Check Locks:

============

select * from pg_locks;


Check Waits:

============

select pid, wait_event_type, wait_event from pg_stat_activity where wait_event is not null;


Any blocks or locked queries/sessions:

=======================================

SELECT

pl.pid as blocked_pid

,psa.usename as blocked_user

,pl2.pid as blocking_pid

,psa2.usename as blocking_user

,psa.query as blocked_statement

FROM pg_catalog.pg_locks pl

JOIN pg_catalog.pg_stat_activity psa

ON pl.pid = psa.pid

JOIN pg_catalog.pg_locks pl2

JOIN pg_catalog.pg_stat_activity psa2

ON pl2.pid = psa2.pid

ON pl.transactionid = pl2.transactionid

AND pl.pid != pl2.pid

WHERE NOT pl.granted;


Connections into the DB

==========================

SELECT state, count(*) FROM pg_stat_activity GROUP BY state;


Connections waiting for a lock

================================

SELECT count(distinct pid) FROM pg_locks WHERE granted = false;


Thursday, July 29, 2021

Republishing Post from  for my reference 

/ / / Auditing Users and Roles in PostgreSQL


Auditing Users and Roles in PostgreSQL

One of the services we offer are security reviews (or audits, if you want), covering a range of areas related to security. It may be a bit surprising, but a topic that often yields the most serious issues is roles and privileges. Perhaps the reason why roles and privileges are a frequent source of issues is that it seems to be quite simple and similar to things the engineers are familiar with (e.g. Unix system of users and groups), but it turns out there are a few key differences with major consequences.

The other parts are either very straightforward and understandable even for sysadmins without much PostgreSQL experience (e.g. authentication config in pg_hba.conf), or the engineers recognize the complexity and take their time to familiarize with the details (a good example of this is Row Level Security).

That is not to say there are no interesting topics e.g. how to use RLS with application-level users but I’ll leave that for another blog post, as this one is about roles and privileges.

So let’s look at roles and privileges a bit a closer …

Owner is a small superuser

When it comes to roles, the initial checks are mostly expected. The role should not be a superuser (as superusers simply bypass various checks), and in general should not have any excessive privileges (e.g. CREATEDB, CREATEROLE and so on).

But it also should not own the database objects (tables, functions, …), since owners can simply grant themselves arbitrary privileges on the objects they own, which turns them into small superusers.

Consider the following example, where we attempt to protect the table from the owner by revoking all the privileges from that role:

db=# CREATE USER u;
CREATE ROLE

db=# SELECT rolsuper FROM pg_roles WHERE rolname = 'u';
 rolsuper
----------
 f
(1 row)

db=# \c 'user=u dbname=db'
You are now connected to database "db" as user "u".

So we have created a user who is not a superuser, and we have connected using that account (that’s the slightly cryptic psql command). Let’s create a table (so the user is an owner) and restrict our own access to it

db=> CREATE TABLE t (id INT);
CREATE TABLE

db=> REVOKE ALL ON t FROM u;
REVOKE

db=> SELECT * FROM t;
ERROR:  permission denied for relation t

So that works, right? Well, the problem is a user with access to SQL (e.g. an “attacker” that discovered a SQL injection vulnerability) can do this:

db=> GRANT ALL ON t TO u;
GRANT

db=> select * from t;
 id
----
(0 rows)

The owner can simply grant all privileges back to himself, defeating the whole privilege system. A single SQL injection vulnerability and it’s a game over. Another issue with owners is that they are not subject to RLS by default, although that can be fixed with a simple ALTER TABLE ... FORCE ROW LEVEL SECURITY.

In any case, this should be a clear hint that the application should use a dedicated role (or multiple roles), not owning any of the objects.

BTW users are often surprised when I mention that we can grant privileges to individual columns e.g. allow SELECT on a subset of columns, UPDATE on a different subset of columns, and so on.

When combined with SECURITY DEFINER functions, this is a great way to restrict access to columns the application should not access directly, but allow special operations. For example it shouldn’t be possible to select all passwords (even if hashed) or e-mails, but it should be possible to verify a password or an e-mail. SECURITY DEFINER functions are great for that, but sadly it’s one of the powerful yet severely underused features 🙁

Role inheritance

Let’s assume you have a role that owns the objects, and a separate role used by the application. In fact, if you have sufficiently complex application, chances are you’ve split it into multiple parts, perhaps segregated into schemas, and each module uses a separate set of roles (owner + application, possibly more).

This gives you the ability to create application roles covering only part of the application e.g. the administration panel needs access to all modules, while a public web interface only needs read-only access to a small subset of modules.

CREATE ROLE module_users;    -- full access to user info
CREATE ROLE module_users_ro; -- limited access user info (register/verify)
CREATE ROLE module_posts;    -- full access to blog posts
CREATE ROLE module_posts_ro; -- read-only access to blog posts
... roles for additional modules ...

CREATE USER admin_user   -- full access
    IN ROLE module_users, module_posts;

CREATE USER web_user     -- limited access
    IN ROLE module_users_ro, module_posts_ro;

In other words, roles may be seen as groups and used for making the privileges easier to manage. There are two aspects that make this different from unix-like groups it’s possible to use multi-level hierarchy of roles (while Unix groups are flat), and inheritance (will get to that in a minute).

The above scheme works just fine, but only if you keep the connections for the two users (admin_user and web_user) separate. With a small number of users (modules, applications) that’s manageable, as you can maintain separate connection pools, but as the number of connection pools grows it ceases to serve the purpose. But can we use a single connection pool and keep the benefit of separate users?

Well, yes. We can create another user role for the connection pool and grant it all the existing users (admin_user and web_user).

CREATE USER pool_user IN ROLE admin_user, web_user

This seems a bit strange, because the new user becomes member of admin_user and web_user roles (users are just roles with LOGIN privilege), effectively inheriting all the privileges. Wasn’t the whole point using roles with limited privileges?

Let me introduce you the SET ROLE command, which can be used to switch the session to arbitrary role the user is member of. So as the pool_user user is member of both admin_user and web_user roles, the connection pool or application may use this:

SET ROLE admin_user

to switch it to “full” privileges for the admin interface, or

SET ROLE web_user

when the connection is intended for the website.

These commands are akin to dropping privileges in Unix. The init scripts are executed as root, but you really don’t want to run all the services as root, so the init script does something like sudo -u or chpst to switch to unprivileged user.

But wait, we can actually do the opposite. We can start with “no privileges” by default, all we need to do is create the role like this:

CREATE USER pool_user NOINHERIT IN ROLE admin_user, web_user

The user is still member of the two roles (and so can switch to them using SET ROLE), but inherits no privileges from them. This has the benefit that if the pool or application fails to do the SET ROLE, it will fail due to lack of privileges on the database objects (instead of silently proceeding with full privileges). So instead of starting with full privileges and eventually dropping most of them, with NOINHERIT we start with no privileges and then acquire a limited subset of them.

But why am I wasting time by explaining all this SET ROLE and INHERIT or NOIHERIT stuff? Well, it has implications on testing.

Note: You have to trust the pool/application to actually execute the SET ROLE command with the right target role, and the user must not be able to execute custom SQL on the connection (because then it’s just a matter of RESET ROLE to gain the full privileges, or SET ROLE to switch to another role). If that’s not the case, the shared connection pool is not a path forward for you.

Testing roles

Pretty much no one tests privileges. Or to be more accurate everyone tests the positive case implicitly, because if you don’t get the necessary privileges the application breaks down. But only very few people verify that there are no unnecessary/unexpected privileges.

The most straightforward way to test absence of privileges (user has no access) might be to walk through all existing objects (tables, columns) and try all compatible privileges. But that’s obviously a lot of combinations and a lot of additional schema-specific work (data types, constraints, …).

Luckily, PostgreSQL provides a collection of useful functions for exactly this purpose (showing just table-related ones, there are additional functions for other object types):

  • has_any_column_privilege(...)
  • has_column_privilege(...)
  • has_table_privilege(...)

So for example it’s trivial to check which roles have INSERT privilege on a given table:

SELECT rolname FROM pg_roles WHERE has_table_privilege(rolname, 'table', 'INSERT')

or listing tables accessible by a given role:

SELECT oid, relname FROM pg_class WHERE has_table_privilege('user', oid, 'INSERT')

And similarly for other privileges and object types. The testing seems fairly trivial – simply run a bunch of queries for the application users, check that the result matches expectation and we’re done.

Note: It’s also possible to use the information_schema, e.g. table_privileges which essentially just runs a query with has_table_privilege and formats the output nicely.

Except there’s a small catch – the inheritance. It works just fine as long as the role inherits privileges through membership, but as soon as there’s a NOINHERIT somewhere, those privileges will not be considered when checking the access (both in the functions and information_schema). Which makes sense, because the current user does not currently have the privileges, but can gain them easily using SET ROLE.

But of course, PostgreSQL also includes pg_has_role() function, so we can merge the privileges from all the roles, for example like this:

SELECT DISTINCT relname
  FROM pg_roles CROSS JOIN pg_class
 WHERE pg_has_role('user', rolname, 'MEMBER')
   AND has_table_privilege(rolname, pg_class.oid, 'SELECT')

Making this properly testable requires more work (to handle additional object types and applicable privileges), but you get the idea.

Summary

Let me briefly summarize this blog post:

  • separate the owner and application user – Don’t use a single role for both things.
  • consider using SET ROLE role – Either drop (INHERIT) or acquire (NOINHERIT).
  • test the expected privileges – Ideally run this as part of regular unit tests if possible.
  • keep it simple – It’s definitely better to have a simple hierarchy of roles you understand.

Thursday, June 3, 2021

PostgreSQL Features

 

List of PostgreSQL Features for quick review before interview

Source:https://www.postgresql.org/about/

Below is an inexhaustive list of various features found in PostgreSQL, with more being added in every major release:

  • Data Types
    • Primitives: Integer, Numeric, String, Boolean
    • Structured: Date/Time, Array, Range, UUID
    • Document: JSON/JSONB, XML, Key-value (Hstore)
    • Geometry: Point, Line, Circle, Polygon
    • Customizations: Composite, Custom Types
  • Data Integrity
    • UNIQUE, NOT NULL
    • Primary Keys
    • Foreign Keys
    • Exclusion Constraints
    • Explicit Locks, Advisory Locks
  • Concurrency, Performance
    • Indexing: B-tree, Multicolumn, Expressions, Partial
    • Advanced Indexing: GiST, SP-Gist, KNN Gist, GIN, BRIN, Covering indexes, Bloom filters
    • Sophisticated query planner / optimizer, index-only scans, multicolumn statistics
    • Transactions, Nested Transactions (via savepoints)
    • Multi-Version concurrency Control (MVCC)
    • Parallelization of read queries and building B-tree indexes
    • Table partitioning
    • All transaction isolation levels defined in the SQL standard, including Serializable
    • Just-in-time (JIT) compilation of expressions
  • Reliability, Disaster Recovery
    • Write-ahead Logging (WAL)
    • Replication: Asynchronous, Synchronous, Logical
    • Point-in-time-recovery (PITR), active standbys
    • Tablespaces
  • Security
    • Authentication: GSSAPI, SSPI, LDAP, SCRAM-SHA-256, Certificate, and more
    • Robust access-control system
    • Column and row-level security
    • Multi-factor authentication with certificates and an additional method
  • Extensibility
    • Stored functions and procedures
    • Procedural Languages: PL/PGSQL, Perl, Python (and many more)
    • SQL/JSON path expressions
    • Foreign data wrappers: connect to other databases or streams with a standard SQL interface
    • Customizable storage interface for tables
    • Many extensions that provide additional functionality, including PostGIS
  • Internationalization, Text Search
    • Support for international character sets, e.g. through ICU collations
    • Case-insensitive and accent-insensitive collations
    • Full-text search

There are many more features that you can discover in the PostgreSQL documentation. Additionally, PostgreSQL is highly extensible: many features, such as indexes, have defined APIs so that you can build out with PostgreSQL to solve your challenges.

Tuesday, June 1, 2021

Topics to Prepare for an Interview: (C, Linux and Networking Interview Questions)

 

This article captures the list of topics to be covered/prepared for an Interview in Networking domain with C as a programming language and Linux as a work environment. This will help the candidate/employee who is experienced, having no idea of how to prepare (begin with) for an interview. All the topics covered here are collected from different experienced employees who gave interview for different companies on Networking domain.

Following are the list of topics which needs to be covered in order to prepare for an interview in Networking domain,

1.    Storage classes in C (Important)

>> Most of the questions will be on static and extern variables

2.    Memory layout of a process in Linux

>> Correlation of memory layout with storage classes

3.    Inter Process Communication (IPC)

>> Message queue, Shared memory, Pipes, FIFO, Semaphore, IOCTL, Sockets -Learn each concept in detail with their use cases

4.    Multi-threading

>> POSIX threads and mutex, critical section problem

5.    Process synchronization Techniques

>> Semaphores and Mutex and their use cases

6.    Socket programming

>> Different types of sockets such as TCP, UDP, RAW, NETLINK, UNIX Domain socket and their use cases (When to use)

7.    C data structures

>> Linked list (Singly linked list and Doubly linked list), Stack, Queue - (add a node (add a node to the front of the list or at the end of the list), delete a node (delete first node or delete a node from the end of the list), reverse a linked list, push-pop operations on stack, detect a loop in linked list, intersection in linked list etc.

8.    Knowledge about your core domain you worked on (Important)

>> Prepare well on your domain (Ex. Firmware, Wi-Fi, L2/L3 etc.) and you should be in a position to explain all the works that you have done on your domain.

9.    Pointers and de-referencing

>> Recall basics of pointers (Good to have deep knowledge on pointers and its use cases)

10. Implementation of our own functions for memcpy, strrev, memmove, sizeof, strlen etc.

>> Understand how each function is actually implemented.

11. Concepts of networking protocols such as DHCP, ARP, ICMP, TCP/IP, NAT, Routing, switching, VLAN etc.

>> Good to know the use cases and scenarios

12. Operations on bits (Important)

>> Toggle a bit, clear a bit, set a bit, count number of bits set in a given number, reverse bits of a number etc.

13. Operating System concepts

>> Process states, scheduling, dispatching, paging, page fault, virtual memory concept etc.

14. Linux internals

>> Interrupt, ISR, system boot up sequence, software interrupts, system calls etc.

15. Dynamic memory allocation concepts

>> malloc, calloc, realloc (return value of these functions and how these memory allocations implemented)

16. Good understanding of C programming language.

>> Brush up on C programming concepts such as function pointer, call back function, pointer arithmetic, MACRO, inline functions, structures padding and packing etc. 



Hope this article will help you to start your preparation for the interview. I will add more topics in future. Keep visiting this space for more update.