Linked Data - Part 3

Okay, let me stop at being a hypocrite bastard (see part 2) and start thinking about solutions. So here is a small recap for what we need to design:

  1. A data-structure that allows lots of freedom.
  2. Allow the managers of the database to enforce structure dynamically.
  3. Make the data-structure easy queryable using the user-enforced structure instead of a completely predefined query-language.
  4. Allow computed knowledge to be added for simple things like "If A IsOfType B and B IsOfType C then A IsOfType C", but also more complex things like "The square root of any number x can be computed as [...]". This will be somewhat of a challenge to design.
  5. Allow undefined data-structures to be added to the database as byte-streams. To be used for anything from adding floating point numbers to audio-files.
  6. Make it possible to add knowledge about these undefined data-structures, so it becomes possible to use it in queries (another a though one, yet vitally important).
  7. Make it a viable alternative to relational-databases, so more companies are willing to make the switch. It will be unlikely our model will be comparable in speed en efficiency, so we need to add other fancy features to convince them.
  8. There should be no need to have a centralized consortium that dictates standards for knowledge structures, although it should be possible that these standards will arise.
  9. It should be able to function as a completely stand-alone database, but being designed so databases can be easily linked. Multi-database queries shouldn't be any more difficult compared to single-database queries.

Where-is-dee-deeThis isn't even the complete 'todo-list', but it would be quite impressive to be able to create a decent design for realizing these points. So impressive even that I'm doubting whether it'll be possible, but that's part of the fun I suppose :). We could start by defining how to add computation as this could be used by a lot of other requirements, or maybe how databases are linked. What about the method for adding user defined structure?

Hmmm where to start, where to start.... Let's just take the "what does this button do?"-approach and just see whether things will blow up in my face. What's the worst that could happen? It's not like we'll be able to get things right from the get-go.

Author-defined structure

Structure should give insurance. If the author of the database likes to add some hierarchical structure to objects such as A IsOfType B and B IsOfType C then structure should enforce that C cannot be of type A. Unfortunately for us, hierarchical is just one of the many possible structures and we will not hard-code specific structures that we are able to enforce. Let's look at an alternative to the hard-coding solution.

We will assume that somehow, in some magical way, inferring knowledge is already possible. Using inference our database also thinks that the relation A IsOfType C exists, even though this relation has never been entered by the author. Using local computation on nodes we could create a rule such as "If X IsOfType Y then it may not be that Y IsOfType X" to perform consistency checks. So the big question is: would it be possible to enforce all the structure we want by creating rules such as this?

Let's say the the author is able to add rules such as "If N1 R1 N2 then it M be that N3 R2 N4" where M is either Must or MayNot,  N1, N2, N3 and N4 are either a specific-node or a variable and R1 and R2 are either a specific relation-type or a variable relation-type? Lets just try it out.

Enforcing a tabular structure

If we are enforcing a tabular structure we want to have values for each cell in the table. Even for cell values that we don't have a value for we'll need to define the value as nil (or something similar).

  • If X IsA Person then it Must be that X HasFirstName Y
  • If X IsA Person then it Must be that X HasLastName Y
  • If X IsA Person then it Must be that X HasNickName Y

If we are creating a person "Jean-Luc Picard" by adding these relations:

  • JLP IsA Person
  • JLP HasFirstName Jean-Luc
  • JLP HasLastName Picard

then I'm expecting the database to give me an inconsistency-error because we are missing "JLP HasNickName 'captain'" or "JLP HasNickName 'Number Zero'", whichever you prefer.

Hmm, but what if we added both? Although it is okay for a person to have two nicknames, shouldn't a tabular structure like this enforce only one value? Well maybe we can support rules such as:

  • If X HasNickName Y then it MayNot be that X HasNickName Z

We could interpret variables that are used only once as some value that may not be equal to another variable that is also only used one. This solution is simple enough, although it seems a bit hacky. This clearly shows the limits of enforcing structures using these simple rules.

Combining with inferred knowledge

Lets say we want to enforce a structure where a person may have at most 3 nicknames. I hope you agree with me that adding relations such as the ones below are completely idiotic (I won't even talk about why):

  • JLP HasNickname1 'catpain'
  • JLP HasNickname2 'Number 0'
  • JLP HasNickname3 'Steward'

An alternative would be by inferring cardinal knowledge. We have yet to talk about how we allow data to be inferred, but assuming it is powerful enough it could supply knowledge such as "JLP HasNicknameCount three", "JLP HasAtLeastOne Nickname",  "JLP HasAtLeastTwo Nickname" and  "JLP HasAtLeastThree Nickname". With inferred knowledge such as this it becomes easy, but we should probably see this as shifting the problem towards a different area without really solving the issue at hand. Still, combining inference knowledge with simplistic consistency-rules would be better than creating rules such as "If [Some very complex rules] then data is (not) consistent". Author-defined structures are often relatively simple, but let's not use that as an excuse for our lack of effort. Besides allowing the author to enforce structure, it also needs to be used for allowing the author to construct simplified queries. We need to limit the complexity of consistency rules so it'll become easy for our model to understand. Allowing queries to be extended with author-defined syntax is hard enough as it is, we do not need to add complex consistency predicates to it, do we? Well... who knows, maybe we'll look into it later.

Consistency of consistency rules?

Our example about the tabular data structure shows another big weakness. We had a structure-type in mind and created consistency rules that would enforce storage of knowledge that complies with our idea of what a table should be like. We never really stored the knowledge of what the tabular-structure should be like and adding tons of consistency rules for every tabular structure seems a bit redundant. If we need knowledge about some structure to be duplicated each and every time, we can't expect the knowledge about the structure to ever become standardized. Can't we just add the knowledge like below and be done with it?

  • Person EnforcesStructure Tabular
  • Person HasPropery HasFirstName
  • Person HasPropery HasLastName
  • Person HasPropery HasNickName

This would prevent us to write consistency rules each and every time, but how is this supposed to work? Well, with a minimal extension on our idea of consistency rules we could manage it like this:

  • If Q EnforcesStructure Tabular and Q HasProperty S and R IsA Q then it Must be that R S T
  • If Q EnforcesStructure Tabular and Q HasProperty S and R IsA Q and R S T then it MayNot be that R S U

Adding conjunction to consistency-rules might be a good idea, but then again it might not. I'm not sure yet, I like consistency rules to be as simple as possible. This addition begs to question whether we should also allow disjunction or other boolean operations. Maybe we need to look at how queries can be simplified with help of using author-defined consistency rules before we make a decision.