As I speak with people from various technical backgrounds, I find there is lot of confusion about how to structure data. For a full background on this and related topics I rely upon the work of Fabian Pascal, C.J. Date, and C.F. Codd, vibrantly maintained on Pascal’s site, dbdebunk.com. I encourage anyone who favors “learning” over “training” to spend some time there.
When I address a new challenge of database development or workflow design, I start by distinguishing between Transaction data and Reference data. Transaction data are those that accumulate over time, but do not change frequently. There are a lot of INSERTs, but not many UPDATEs. Orders, invoices, calls, observations, and events are all types of Transaction data. Transaction data may include attributes that contain Reference data.
Reference data are those that remain mostly static over time, but once established, change infrequently. Examples include addresses, phone numbers, product lists, clients, vendors, etc. Reference data grows at a much slower pace than Transaction data, so much of the maintenance involves de-duplication and consistency of form. For instance, address data can be better de-duplicated if at first you make all the street types consistent: i.e. “Avenue” and “Ave” and “Av” are all made to be the same “Ave”. If you can do this BEFORE inserting it into a database, that saves A LOT of time and resources.
By contrast, Transaction data maintenance is more focused on removing data that are no longer needed. The most efficient way to manage Transaction data in a database is to 1) not load attributes that will never be used, and 2) partition data based on attributes that reflect its useful life. This makes it easier to remove it without scanning all the data you need to keep.
So the next time you are thinking about loading up your linked lists, name-value pairs, and dictionaries into a database, remember that you only need to save what you will use. Kind of like your refrigerator. If you are not going to eat it, you don’t need to wrap it up or freeze it. Collect only what you need. Determining what you “need” is of course a prerequisite.