I've done a lot of reading over the past about SQL and database structures. One of the most helpful being:
http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database
which does an overview on many types with links to more information. However, after reading it has brought up several questions. Currently, I'm looking to build a database, and am trying to choose the best structure for an extremely large
number of sets. It will be leaf heavy, but also a moderate number of middle nodes. It will initially have several inserts, but few inserts, and removes after insertion. However, there would be frequent updates to information within all of the
nodes, hence navigating to specific items through their lineage is important. Lets use a business employee model to represent my case which has 4 tables. They link as follows:
Region Manager
\ /
Store
|
Employees
Now I've seen the examples on msdn which reference a join statement to link tables, but I assumed that when one uses join that it linearly goes through all elements in a table binding and checking whether the condition (if where is used) is satisfied. Is my assumption correct? This is what I understand to be a type of adjacency set and to me would be way too slow for my application and I would need to use something with fast queries for a large dataset.
The following methods as I understand it don't partition into separate tables and instead contain all information in one table.
Nested sets is another option, but I dislike how intensity of the insert/move/delete procedures. I also read that recursive procedures can incur major costs, but since I'm using SQLite is that still true?
I saw materialized path/lineage column which sounds pretty great, but requires string analysis. Plus, I don't have a single lineage. I figured I could fix the initial issue by referencing positive or negative values of lineage (region would have a negative id while manager would have a positive).
The last method described was multiple column lineage which has a maximum number of nests, but in my case this is of no issue since I have a finite number.
I like the last method the best, however I'd rather just store everything in 4 different tables and have a parent/child id like we have for adjacency lists. However, I don't want to linearly search each table as that would be expensive if queried on the employee table. Is it possible to linearly search regions (there would be a lot less of them) and find its children (potentially multiple) in store and find all the employees of store without linearly searching each subsequent table for matches? Or is it silly to not just use a join/where query to get the information?
I guess I'm still quite lost when it comes to building/querying my database. If I could just get some more information regarding how these techniques iterate through the tables and suggestions of how to setup/query for this table setup I would be greatly appreciative! Thanks for taking the time to read this!
Bests,
Jeremy