SQL Basics for Smart People
Table of Contents
1 Overview
Some simple SQL (Structured Query Language) basics for smart people.
2 Example: Entity Database With Tags
SQL is a bunch of tables, with rows and columns.
Say you have a "database" of "entities" (which I'll assume are basically BLOBs – binary large objects) and you want them tagged with various arbitrary tags.
So, like:
- Gryphon (entity)
- flying
- walking
- yellow
- Phoenix (entity)
- flying
- rainbow
- fast
- Gelatinous cube (entity)
- walking
- pink
- slow
- Orc (entity)
- walking
- green
And you want to store them in a database so you can query against tags (say "flying" AND "walking").
So, you'd have three tables:
- the entities themselves
- the tags themselves (so you can do things like present the user with a warning
"The tag 'wlaking' doesn't exist. Are you sure you want to create a new tag?"
- the connections between entities and tags
2.1 Entities
Table name: Entity
EntityID | EntityName | EntityBlob |
1 | Gryphon | gryphonBlob |
2 | Phoenix | phoenixBlob |
3 | Gelatinous Cube | gelatinousCubeBlob |
4 | Orc | orcBlob |
2.2 Tags
Table name: Tag
TagID | TagName |
1 | Flying |
2 | Walking |
3 | Yellow |
4 | Rainbow |
5 | Fast |
6 | Pink |
7 | Slow |
8 | Green |
2.3 Join/Bridge/Junction/Gerund/Cross-reference Table
Hard to know what to call these types of tables. I've seen several different names for them. This is how you do "many-to-many" relationships (each entity has many tags, and each tag tags many entities).
Table name: EntityTag
EntityTagID | EntityID | TagID |
1 | 1 | 1 |
2 | 1 | 2 |
3 | 1 | 3 |
4 | 2 | 1 |
5 | 2 | 4 |
6 | 2 | 5 |
7 | 3 | 2 |
8 | 3 | 6 |
9 | 3 | 7 |
10 | 4 | 2 |
11 | 4 | 8 |
2.4 Sample database load script
Just for grins, I took this for a test run using SQLiteStudio. Here's the load script so you don't have to type all that data in yourself:
(Note to self: I changed the metadata on this particular object in the S3 bucket to set Content-type
to
text/plain
. There's probably a wildcard way to do it, but it might need to be done after a sync?)
3 The four SQL verbs
Well, the four main ones.
3.1 SELECT – Selects rows out of a table
SELECT * FROM Entity
(Selects all the columns from the Entity table.)
SELECT EntityName FROM Entity
(Selects just the name column from the table.)
SELECT e.EntityName , COUNT(*) AS TagCount -- Counts rows, requires GROUP BY clause (below) FROM Entity e JOIN EntityTag et ON et.EntityID = e.EntityID JOIN Tag t ON t.TagID = et.TagID WHERE t.TagName IN ('Flying', 'Walking') GROUP BY e.EntityName HAVING TagCount = 2
(Or something. I haven't actually set this up and tested it. [Narrator: it works.] You could probably also do this with an "inner SELECT", but that might be (a lot) less efficient.)
(Returns the names of all entities that have both 'Flying' and 'Walking' tags. Here, we're joining all three tables together into one monstrous virtual table and specifying how to match the rows up (otherwise, we'd wind up with a cross-product between all the rows, which is (usually) not what you want.))
(In general, SQL strings use single quotes.)
(Technical note: if you were parameterizing this query, you'd probably want to pass the desired tags in as a table-typed parameter (i.e., a collection of strings, not a single string) and have another JOIN clause with the passed-in table of tag names.)
3.1.1 Basic SELECT syntax
Basically, it's:
SELECT <columns> FROM <table(s)> [WHERE <condition>] [ORDER BY <columns>]
There are lots of variations and clever extra little fillips, as you can see above.
3.1.2 Inner SELECT
SELECT e.EntityName FROM Entity e WHERE EXISTS (SELECT t.TagName FROM Tag t JOIN EntityTag et on et.TagID = t.TagID AND et.EntityID = e.EntityID WHERE t.TagName = 'Flying') AND EXISTS (SELECT t.TagName FROM Tag t JOIN EntityTag et on et.TagID = t.TagID AND et.EntityID = e.EntityID WHERE t.TagName = 'Walking')
The reason this is less efficient is that we get essentially nested loops, where we iterate over the Entity rows, and, for each row, execute two more SELECT statements, querying the database in a loop with different parameters. A lot of SQL engines can key off the EXISTING keyword and get smart with optimizations, but, still, inner SELECTS can get you into a lot of trouble, performance-wise.
3.2 INSERT – Inserts rows into a table
INSERT Entity (EntityName, EntityBlob) VALUES ('Unicorn', /unicornBlob/)
3.3 UPDATE – Updates an existing row in a table
(Fix a misspelling.)
UPDATE Tag SET TagName = 'Walking' WHERE TagName = 'Wlaking'
3.4 DELETE – Deletes a row from a table
(In real systems, this isn't often used. Instead, we usually just UPDATE a row to SET a flag to indicate that it's "logically" deleted.)
DELETE EntityTag WHERE EntityID = 1 AND TagID = 7
4 Moar Knowledge
You can fool around with SQL by installing SQLiteStudio from https://sqlitestudio.pl/.
sqlite
is a well-known lightweight implementation of a SQL database that doesn't get too
complicated (so, not multi-user, no transactions, etc. Although it may have come some distance
since I last used it – anyway, if you want a serious database that can handle concurrency, you
should try https://www.postgresql.org/).
https://www.lynda.com/SQL-training-tutorials/446-0.html?category=developer_50
Or this, I dunno: https://learntocodewith.me/posts/sql-guide/