UP | HOME

SQL Basics for Smart People

Table of Contents

1 Overview

Some simple SQL (Structured Query Language) basics for smart people.

2 Example: Entity Database With Tags

SQL is a bunch of tables, with rows and columns.

Say you have a "database" of "entities" (which I'll assume are basically BLOBs – binary large objects) and you want them tagged with various arbitrary tags.

So, like:

  • Gryphon (entity)
    • flying
    • walking
    • yellow
  • Phoenix (entity)
    • flying
    • rainbow
    • fast
  • Gelatinous cube (entity)
    • walking
    • pink
    • slow
  • Orc (entity)
    • walking
    • green

And you want to store them in a database so you can query against tags (say "flying" AND "walking").

So, you'd have three tables:

  • the entities themselves
  • the tags themselves (so you can do things like present the user with a warning "The tag 'wlaking' doesn't exist. Are you sure you want to create a new tag?"
  • the connections between entities and tags

2.1 Entities

Table name: Entity

EntityID EntityName EntityBlob
1 Gryphon gryphonBlob
2 Phoenix phoenixBlob
3 Gelatinous Cube gelatinousCubeBlob
4 Orc orcBlob

2.2 Tags

Table name: Tag

TagID TagName
1 Flying
2 Walking
3 Yellow
4 Rainbow
5 Fast
6 Pink
7 Slow
8 Green

2.3 Join/Bridge/Junction/Gerund/Cross-reference Table

Hard to know what to call these types of tables. I've seen several different names for them. This is how you do "many-to-many" relationships (each entity has many tags, and each tag tags many entities).

Table name: EntityTag

EntityTagID EntityID TagID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 2 5
7 3 2
8 3 6
9 3 7
10 4 2
11 4 8

2.4 Sample database load script

Just for grins, I took this for a test run using SQLiteStudio. Here's the load script so you don't have to type all that data in yourself:

SQL/play.sql

(Note to self: I changed the metadata on this particular object in the S3 bucket to set Content-type to text/plain. There's probably a wildcard way to do it, but it might need to be done after a sync?)

3 The four SQL verbs

Well, the four main ones.

3.1 SELECT – Selects rows out of a table

SELECT * FROM Entity

(Selects all the columns from the Entity table.)

SELECT EntityName FROM Entity

(Selects just the name column from the table.)

SELECT e.EntityName
        , COUNT(*) AS TagCount  -- Counts rows, requires GROUP BY clause (below)
    FROM Entity e
        JOIN EntityTag et ON et.EntityID = e.EntityID
        JOIN Tag t ON t.TagID = et.TagID
    WHERE t.TagName IN ('Flying', 'Walking')
    GROUP BY e.EntityName
    HAVING TagCount = 2

(Or something. I haven't actually set this up and tested it. [Narrator: it works.] You could probably also do this with an "inner SELECT", but that might be (a lot) less efficient.)

(Returns the names of all entities that have both 'Flying' and 'Walking' tags. Here, we're joining all three tables together into one monstrous virtual table and specifying how to match the rows up (otherwise, we'd wind up with a cross-product between all the rows, which is (usually) not what you want.))

(In general, SQL strings use single quotes.)

(Technical note: if you were parameterizing this query, you'd probably want to pass the desired tags in as a table-typed parameter (i.e., a collection of strings, not a single string) and have another JOIN clause with the passed-in table of tag names.)

3.1.1 Basic SELECT syntax

Basically, it's:

SELECT <columns> FROM <table(s)> [WHERE <condition>] [ORDER BY <columns>]

There are lots of variations and clever extra little fillips, as you can see above.

3.1.2 Inner SELECT

SELECT e.EntityName
    FROM Entity e
    WHERE EXISTS (SELECT t.TagName
                    FROM Tag t
                        JOIN EntityTag et on et.TagID = t.TagID AND et.EntityID = e.EntityID
                    WHERE t.TagName = 'Flying')
        AND EXISTS (SELECT t.TagName
                    FROM Tag t
                        JOIN EntityTag et on et.TagID = t.TagID AND et.EntityID = e.EntityID
                    WHERE t.TagName = 'Walking')

The reason this is less efficient is that we get essentially nested loops, where we iterate over the Entity rows, and, for each row, execute two more SELECT statements, querying the database in a loop with different parameters. A lot of SQL engines can key off the EXISTING keyword and get smart with optimizations, but, still, inner SELECTS can get you into a lot of trouble, performance-wise.

3.2 INSERT – Inserts rows into a table

INSERT Entity (EntityName, EntityBlob) VALUES ('Unicorn', /unicornBlob/)

3.3 UPDATE – Updates an existing row in a table

(Fix a misspelling.)

UPDATE Tag SET TagName = 'Walking' WHERE TagName = 'Wlaking'

3.4 DELETE – Deletes a row from a table

(In real systems, this isn't often used. Instead, we usually just UPDATE a row to SET a flag to indicate that it's "logically" deleted.)

DELETE EntityTag WHERE EntityID = 1 AND TagID = 7

4 Moar Knowledge

You can fool around with SQL by installing SQLiteStudio from https://sqlitestudio.pl/.

sqlite is a well-known lightweight implementation of a SQL database that doesn't get too complicated (so, not multi-user, no transactions, etc. Although it may have come some distance since I last used it – anyway, if you want a serious database that can handle concurrency, you should try https://www.postgresql.org/).

https://www.lynda.com/SQL-training-tutorials/446-0.html?category=developer_50

Or this, I dunno: https://learntocodewith.me/posts/sql-guide/

Created: 2023-08-06 Sun 14:23

Emacs 27.2 (Org mode 9.4.4)

Validate