A set intersection oracle (SIO) is a data structure which represents a collection of sets and can quickly answer queries about whether the set intersection of two given sets is non-empty.
The input to the problem is n finite sets. The sum of the sizes of all sets is N (which also means that there are at most N distinct elements). The SIO should quickly answer any query of the form:
- "Does the set Si intersect the set Sk"?
Without any pre-processing, a query can be answered by inserting the elements of Si into a temporary hash table and then checking for each element of Sk whether it is in the hash table. The query time is .
Alternatively, we can pre-process the sets and create an n-by-n table where the intersection information is already entered. Then the query time is , but the memory required is .
The SIO problem can be reduced to the approximate distance oracle (DO) problem, in the following way.[1]
- Build an undirected bipartite graph where one part contains a node for each of the n sets, and the other part contains a node for each of the (at most) N elements contained in the sets.
- There is an edge between a set and an element, iff the set contains the element.
This graph has the following properties:
- If two sets intersect, the distance between them is 2 (from one set, to an element in the intersection, to the other set).
- If two sets do not intersect, the distance between them is at least 4.
So, with a DO whose approximation factor of less than 2, we can solve the SIO problem.
It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires space to answer queries in time . If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time.[1]