Copyright (C) 2022 Broadcom. All rights reserved.
The term "Broadcom" refers solely to the Broadcom Inc. corporate affiliate that
owns the software below. This work is licensed under the OpenAFC Project
License, a copy of which is included with this software program.
Tools For Working With Log Databases
Table of Contents
- Databases
als_siphon.py- Moving Logs From Kafka To Postgresals_query.py- Querying Logs From Postgres Database
Databases
ALS (AFC Log Storage) functionality revolves around two PostgreSQL databases, used for log storage: ALS and AFC_LOGS.
ALS Database
Stores log of AFC Request/Response/Config data. Has rather convoluted multitable structure.
SQL code for creation of this database contained in ALS.sql file. This file should be considered generated and not be manually edited.
als_db_schema folder contains source material for ALS.sql generation:
-
ALS.dbml. Source file for dbdiagram.io DB diagramming site. Copy/paste content of this file there, make modifications, then copy/paste back to this file.
Also upon completion ALS_raw.sql and ALS.png should be exported (as Export to PostgreSQL and Export to PNG respectively) - see below. -
ALS_raw.sql. Database creation SQL script that should exported from dbdiagram.io after changes made to ALS.dbml.
This file is almost like final ALS.sql, but requires certain tweaks:- Declaring used PostgreSQL extensions (PostGIS in this case)
- Removal of many-to-many artifacts. For many-to-many relationships dbdiagram.io creates artificial tables that are, adding insult to injury, violate PostgreSQL syntax. They are not used and should be removed.
- Segmentation. Database is planned with segmentation in mind (by month_idx field). But segmentation itself not performed. This will need to be done eventually.
-
als_rectifier.awk AWK script for converting ALS_raw.sql to ALS.sql.
-
ALS.png Picturesque database schema. Should be exported as PNG after changes, made to ALS.dbml.
AFC_LOGS Database
For each JSON log type (topic on Kafka parlance) this database has separate table with following columns:
-
time Log record timetag with timezone
-
source String that uniquely identifies entity that created log record
-
log JSON log record.
Initial Database
To create database als_siphon.py script should connect to already database. This already existing database named initial database, by default it is built-in database named postgres.
Template Databases
Template databases, used for creation of ALS and AFC_LOGS databases. Something other than default might be used (but not yet, as of time of this writing).
als_siphon.py - Moving Logs From Kafka To Postgres
The main purpose of als_siphon.py is to fetch log records from Kafka and move them to previously described PostgreSQL databases. Also it can initialize those databases.
$ als_siphon.py COMMAND PARAMETERS
Commands are:
-
initCreate ALS and/or AFC_LOGS database. If already exists, databases may be recreated or left intact. -
siphonDo the moving from Kafka to PostgreSQL. -
init_siphonFirst create databases then do the siphoning. Used for Docker operation.
Parameters are many - see help messages.
als_query.py - Querying Logs From Postgres Database
This script queries logs, stored in ALS and AFC_LOGS databases.
As of time of this writing this script only supports log command that reads JSON logs from AFC_LOGS
Installation
als_query.py requires Python 3 with reasonably recent sqlalchemy, psycopg2, geoalchemy2 modules installed (latter is optional - not required for e.g. log command).
Proper installation of these modules requires too much luck to be described here (as even venv/virtualenv does not help always - only sometimes). If you'll succeed - fine, otherwise there is one more method: running from the container where als_siphon.py installed. In latter case invocation looks like this:
$ docker exec SIPHON_CONTAINER als_query.py CMD ...
Here SIPHON_CONTAINER is either value from first column of docker ps or from last column of docker-compose ps.
Addressing PostgreSQL Server
Another important aspect is how to access PostgreSQL database server where logs were placed.
Explicit specification
Using --server (aka -s) and --password parameters of als_query.py command line). Here are most probable cases:
-
als_query.pyruns insideals_siphon.pycontainer, PostgreSQL runs inside the container, namedbulk_postgresin docker-compose.yaml (that's how it is named as of time of this writing):
$ docker exec SIPHON_CONTAINER als_query.py CMD \
--server [USER@]als_postrgres[:PORT][?OPTIONS] [--password PASSWORD] ...
HereUSERorPORTmight be omitted if they arepostgresand5432respectively.--password PASSWORDandOPTIONSare optional.
Actually, in this case--serverand--passwordmay be omitted - see below on the use of environment variables. -
User/host/port of PostgreSQL server is known:
$ [docker exec SIPHON_CONTAINER] als_query CMD \
--server [USER@]HOST[:PORT][?OPTIONS] [--password PASSWORD] ... -
als_query.pyruns outside container, PostgreSQL runs inside container:
$ als_query.py CMD \
--server [USER@]^POSTGRES_CONTAINER[:PORT][?OPTIONS] \
[--password PASSWORD] ...
Note the***^***beforePOSTGRES_CONTAINER. Here, againPOSTGRES_CONTAINERis either value from first column ofdocker psor from last column ofdocker-compose psfor container running PostgreSQL
I expect #1 to be the common case for development environment, #2 - for deployment environment, #3 - for illustrations (for sake of brevity) or for some lucky conditions.
Environment variables
If --server parameter not specified als_query.py attempts to use environment variables:
POSTGRES_LOG_USER,POSTGRES_HOST,POSTGRES_PORT,POSTGRES_LOG_PASSWORDfor accessing AFC_LOGS databasePOSTGRES_ALS_USER,POSTGRES_HOST,POSTGRES_PORT,POSTGRES_ALS_PASSWORDfor accessing ALS database
These environment variables are passed to container with als_siphon.py, so they are quite natural choice when running als_query from there (case #1 above).
Hence for case #1 als_query.py command line would actually look like this:
$ docker exec SIPHON_CONTAINER als_query.py CMD ...
Where ... does not contain --server
log Command
log command retrieves JSON logs from AFC_LOGS database. Each JSON logs is belongs to certain topic (handy term, originated from Kafka). Topic is a string (lowercase highly recommended, 'ALS' name must not be used) that supposedly corresponds to format (content) of JSON data.
Topic specifies a name of table inside AFC_LOGS database.
Since content of JSON may be any and PostgreSQL already provides the special 'SELECT' syntax for accessing JSON data (see e.g. here and here, google for further assistance), log command is, in fact, thin wrapper around SELECT command, plus a couple of additional options.
Each table in AFC_LOGS has the following columns (this is important when composing SELECT statements):
| Column | Content |
|---|---|
| time | Time when log record was made in (includes date, time, timezone) |
| source | Entity (e.g. WEB server) that made record |
| log | JSON log data |
Command format:
$ [docker exec SIPHON_CONTAINER] als_query.py log OPTIONS [SELECT_BODY]
| Parameter | Meaning |
|---|---|
| --server/-s [USER@][^]HOST_OR_CONTAINER[:PORT][?OPTIONS] | PostgreSQL server connection parameters. See discussion in Installing and running chapter. This parameter is mandatory |
| --password PASSWORD | PostgreSQL connection password (if required) |
| --topics | List existing topics (database tables) |
| --sources [TOPIC] | List sources - all or from specific topic |
| --format/-f {bare|json|csv} | Output format for SELECT-based queries: bare - unadorned single column output, csv - output as CSV table (default), json - output as JSON list or row dictionaries |
| SELECT_BODY | SQL SELECT statement body (without leading SELECT and trailing ;. May be unquoted, but most likely requires quotes because of special symbols like *, >, etc. |
log Command Examples
Suppose that:
- There are various topics (tables), among which there is topic few (let me remind again, that lowercase topic names are recommended), filled with JSONs with structure similar to this:
{
"a": 42,
"b": [1, 2, 3],
"c": {"d": 57}
}
als_query.pyruns inregression_als_siphon_1container (YMMV - see output ofdocker-compose ps). In this case there is no need to pass--serverparameter, as it will be taken from environment variables.
Now, here are some possible actions:
-
List all topics:
$ docker exec regression_als_siphon_1 als_query.py log --topics
Note that there is no--serverparameter here, asals_query.pywould values, passed over environment variables. -
Print content of foo topic (table) in its entirety, using CSV format:
$ docker exec regression_als_siphon_1 als_query.py log "* from foo"
This invokesSELECT * from foo;on AFC_LOGS database of PostgreSQL server. -
Print key names of JSONs of topic foo:
$ docker exec regression_als_siphon_1 als_query.py log \
json_object_keys(log) from foo
Note that quotes may be omitted here, as there are no special symbols in select statement. -
From topic foo print values of c.d for all records, using bare (unadorned) format:
$ docker exec regression_als_siphon_1 als_query.py log \
-f bare "log->'c'->'d' from foo"
Note the quotes around field names -
From topic foo print only values of b[0] for all records where a field equals 179:
$ docker exec regression_als_siphon_1 als_query.py log \
"log->'b'->0 from foo where log->'a' = 179"
Note the way list indexing is performed (->0). -
Print maximum value of column a in topic foo:
$ docker exec regression_als_siphon_1 als_query.py log "MAX(log->'a') from foo" -
Print log records in given time range:
$ docker exec regression_als_siphon_1 als_query.py log \
"* from foo where time > '2023-02-08 23:25:54.484174+00:00'" \
"and time < '2023-02-08 23:28:54.484174+00:00'"