SQL Basics for Smart People
Table of Contents
1 Overview
Some simple SQL (Structured Query Language) basics for smart people.
2 Example: Entity Database With Tags
SQL is a bunch of tables, with rows and columns.
Say you have a "database" of "entities" (which I'll assume are basically BLOBs – binary large objects) and you want them tagged with various arbitrary tags.
So, like:
- Gryphon (entity)
- flying
- walking
- yellow
- Phoenix (entity)
- flying
- rainbow
- fast
- Gelatinous cube (entity)
- walking
- pink
- slow
- Orc (entity)
- walking
- green
And you want to store them in a database so you can query against tags (say "flying" AND "walking").
So, you'd have three tables:
- the entities themselves
- the tags themselves (so you can do things like present the user with a warning
"The tag 'wlaking' doesn't exist. Are you sure you want to create a new tag?" - the connections between entities and tags
2.1 Entities
Table name: Entity
| EntityID | EntityName | EntityBlob |
| 1 | Gryphon | gryphonBlob |
| 2 | Phoenix | phoenixBlob |
| 3 | Gelatinous Cube | gelatinousCubeBlob |
| 4 | Orc | orcBlob |
2.2 Tags
Table name: Tag
| TagID | TagName |
| 1 | Flying |
| 2 | Walking |
| 3 | Yellow |
| 4 | Rainbow |
| 5 | Fast |
| 6 | Pink |
| 7 | Slow |
| 8 | Green |
2.3 Join/Bridge/Junction/Gerund/Cross-reference Table
Hard to know what to call these types of tables. I've seen several different names for them. This is how you do "many-to-many" relationships (each entity has many tags, and each tag tags many entities).
Table name: EntityTag
| EntityTagID | EntityID | TagID |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 4 |
| 6 | 2 | 5 |
| 7 | 3 | 2 |
| 8 | 3 | 6 |
| 9 | 3 | 7 |
| 10 | 4 | 2 |
| 11 | 4 | 8 |
2.4 Sample database load script
Just for grins, I took this for a test run using SQLiteStudio. Here's the load script so you don't have to type all that data in yourself:
(Note to self: I changed the metadata on this particular object in the S3 bucket to set Content-type to
text/plain. There's probably a wildcard way to do it, but it might need to be done after a sync?)
3 The four SQL verbs
Well, the four main ones.
3.1 SELECT – Selects rows out of a table
SELECT * FROM Entity
(Selects all the columns from the Entity table.)
SELECT EntityName FROM Entity
(Selects just the name column from the table.)
SELECT e.EntityName
, COUNT(*) AS TagCount -- Counts rows, requires GROUP BY clause (below)
FROM Entity e
JOIN EntityTag et ON et.EntityID = e.EntityID
JOIN Tag t ON t.TagID = et.TagID
WHERE t.TagName IN ('Flying', 'Walking')
GROUP BY e.EntityName
HAVING TagCount = 2
(Or something. I haven't actually set this up and tested it. [Narrator: it works.] You could probably also do this with an "inner SELECT", but that might be (a lot) less efficient.)
(Returns the names of all entities that have both 'Flying' and 'Walking' tags. Here, we're joining all three tables together into one monstrous virtual table and specifying how to match the rows up (otherwise, we'd wind up with a cross-product between all the rows, which is (usually) not what you want.))
(In general, SQL strings use single quotes.)
(Technical note: if you were parameterizing this query, you'd probably want to pass the desired tags in as a table-typed parameter (i.e., a collection of strings, not a single string) and have another JOIN clause with the passed-in table of tag names.)
3.1.1 Basic SELECT syntax
Basically, it's:
SELECT <columns> FROM <table(s)> [WHERE <condition>] [ORDER BY <columns>]
There are lots of variations and clever extra little fillips, as you can see above.
3.1.2 Inner SELECT
SELECT e.EntityName
FROM Entity e
WHERE EXISTS (SELECT t.TagName
FROM Tag t
JOIN EntityTag et on et.TagID = t.TagID AND et.EntityID = e.EntityID
WHERE t.TagName = 'Flying')
AND EXISTS (SELECT t.TagName
FROM Tag t
JOIN EntityTag et on et.TagID = t.TagID AND et.EntityID = e.EntityID
WHERE t.TagName = 'Walking')
The reason this is less efficient is that we get essentially nested loops, where we iterate over the Entity rows, and, for each row, execute two more SELECT statements, querying the database in a loop with different parameters. A lot of SQL engines can key off the EXISTING keyword and get smart with optimizations, but, still, inner SELECTS can get you into a lot of trouble, performance-wise.
3.2 INSERT – Inserts rows into a table
INSERT Entity (EntityName, EntityBlob) VALUES ('Unicorn', /unicornBlob/)
3.3 UPDATE – Updates an existing row in a table
(Fix a misspelling.)
UPDATE Tag SET TagName = 'Walking' WHERE TagName = 'Wlaking'
3.4 DELETE – Deletes a row from a table
(In real systems, this isn't often used. Instead, we usually just UPDATE a row to SET a flag to indicate that it's "logically" deleted.)
DELETE EntityTag WHERE EntityID = 1 AND TagID = 7
4 Moar Knowledge
You can fool around with SQL by installing SQLiteStudio from https://sqlitestudio.pl/.
sqlite is a well-known lightweight implementation of a SQL database that doesn't get too
complicated (so, not multi-user, no transactions, etc. Although it may have come some distance
since I last used it – anyway, if you want a serious database that can handle concurrency, you
should try https://www.postgresql.org/).
https://www.lynda.com/SQL-training-tutorials/446-0.html?category=developer_50
Or this, I dunno: https://learntocodewith.me/posts/sql-guide/