Limitations
It is very important to apply the principle of least privilege when defining user roles and privileges. Further to that, Neo4j’s role-based access control has some limitations and implications that users should be aware of, such as:
-
Impact on query results regardless of whether indexes are used.
-
Impact on query results when nodes have multiple labels.
-
The need for careful management of user roles and privileges to avoid unintended data exposure.
-
Potential performance impacts when querying large graphs with complex security rules.
Security and indexes
Neo4j lets you create and use indexes to speed up Cypher queries. See the Cypher Manual → Indexes for more details on the different types of indexes available in Neo4j.
However, Neo4j’s security model still controls what results you see, regardless of whether or not you use indexes. For example, when you use search-performance indexes (non–full-text) indexes, queries return the same results they would without any index. This means that, if the security model causes fewer results to be returned due to restricted read access in graph and sub-graph access control, the index will also return the same fewer results.
Full-text indexes work differently. These indexes use Lucene under the hood. Because of that, Neo4j cannot check whether a security violation has affected each specific entry returned from the index. So, if there is any chance a result might violate active security privileges for a query, Neo4j returns zero results from the full-text indexes.
Also, Cypher does not use full-text indexes automatically — you have to explicitly call procedures to use them. This avoids a situation where the same Cypher query would return different results simply because such an index exists. The problem is that if you do not know this behavior, you might expect the full-text index to return the same results that a different but semantically similar Cypher query does.
Example with denied properties
Consider the following example.
The database has nodes with labels :User
and :Person
, and they have properties name
and surname
.
There are indexes on both properties:
CREATE INDEX singleProp FOR (n:User) ON (n.name);
CREATE INDEX composite FOR (n:User) ON (n.name, n.surname);
CREATE FULLTEXT INDEX userNames FOR (n:User|Person) ON EACH [n.name, n.surname];
Full-text indexes support multiple labels. See Cypher Manual → Indexes for full-text search for more details on creating and using full-text indexes. |
After creating these indexes, it may look that the latter two indexes accomplish the same thing. However, this is not completely accurate. The composite and full-text indexes behave in different ways and are focused on different use cases. A key difference is that full-text indexes are backed by Lucene, and will use the Lucene syntax for querying.
This has consequences for users restricted on the labels or properties involved in the indexes. Ideally, if the labels and properties in the index are denied, they can correctly return zero results from both native indexes and full-text indexes. However, there are borderline cases where this is not that simple.
Imagine the following nodes are added to the database:
CREATE (:User {name: 'Sandy'});
CREATE (:User {name: 'Mark', surname: 'Andy'});
CREATE (:User {name: 'Andy', surname: 'Anderson'});
CREATE (:User:Person {name: 'Mandy', surname: 'Smith'});
CREATE (:User:Person {name: 'Joe', surname: 'Andy'});
Consider denying the label :Person
:
DENY TRAVERSE ON GRAPH * NODES Person TO users
If the user runs a query that uses the native single property index on name
:
MATCH (n:User) WHERE n.name CONTAINS 'ndy' RETURN n.name
This query performs several checks:
-
Scans the index to create a stream of results of nodes with the
name
property, which leads to five results. -
Filters the results to include only nodes where
n.name CONTAINS 'ndy'
, filtering outMark
andJoe
, which leads to three results. -
Filters the results to exclude nodes that also have the denied label
:Person
, filtering outMandy
, which leads to two results.
Two results will be returned from this dataset and only one of them has the surname
property.
In order to use the native composite index on name
and surname
, the query needs to include a predicate on the surname
property as well:
MATCH (n:User)
WHERE n.name CONTAINS 'ndy' AND n.surname IS NOT NULL
RETURN n.name
This query performs several checks, which are almost identical to the single property index query:
-
Scans the index to create a stream of results of nodes with the
name
andsurname
property, which leads to four results. -
Filters the results to include only nodes where
n.name CONTAINS 'ndy'
, filtering outMark
andJoe
, which leads to two results. -
Filters the results to exclude nodes that also have the denied label
:Person
, filtering outMandy
, which leads to only one result.
Only one result was returned from the above dataset. What if this query with the full-text index was used instead:
CALL db.index.fulltext.queryNodes("userNames", "ndy") YIELD node, score
RETURN node.name
The problem now is that it is not certain whether the results provided by the index are achieved due to a match to the name
or the surname
property.
The steps taken by the query engine would be:
-
Run a Lucene query on the full-text index to produce results containing
ndy
in either property, leading to five results. -
Filter the results to exclude nodes that also have the label
:Person
, filtering outMandy
andJoe
, leading to three results.
This difference in results is caused by the OR
relationship between the two properties in the index creation.
Denying properties
Now consider denying access on properties, like the surname
property:
DENY READ {surname} ON GRAPH * TO users
For that, run the same queries again:
MATCH (n:User)
WHERE n.name CONTAINS 'ndy'
RETURN n.name
This query operates exactly as before, returning the same two results, because nothing in it relates to the denied property.
However, this is not the same for the query targeting the composite index:
MATCH (n:User)
WHERE n.name CONTAINS 'ndy' AND n.surname IS NOT NULL
RETURN n.name
Since the surname
property is denied, it will appear to always be null
and the composite index empty. Therefore, the query returns no result.
Now consider the full-text index query:
CALL db.index.fulltext.queryNodes("userNames", "ndy") YIELD node, score
RETURN node.name
The problem remains, since it is not certain whether the results provided by the index were returned due to a match on the name
or the surname
property.
Results from the surname
property now need to be excluded by the security rules, because they require that the user is unable to see any surname
properties.
However, the security model is not able to introspect the Lucene query in order to know what it will actually do, whether it works only on the allowed name
property, or also on the disallowed surname
property.
What is known is that the earlier query returned a match for Joe Andy
which should now be filtered out.
Therefore, in order to never return results the user should not be able to see, all results need to be blocked.
The steps taken by the query engine would be:
-
Determine if the full-text index includes denied properties.
-
If yes, return an empty results stream. Otherwise, it will process as described before.
In this case, the query will return zero results rather than simply returning the results Andy
and Sandy
, which might have been expected.
Avoiding fail-open DENY
behavior
A DENY
rule fails open when its criteria is not met, so Neo4j does not apply the restriction and it grants access by default if a broader GRANT
exists.
This can lead to unintended data exposure if the DENY
rule is not carefully crafted.
To avoid this, you can apply the principle of least privilege and allow access only to the specific data that the user should see.
For example, consider the following scenarios:
DENY
failing open with property-based RBACYou grant a user access to a property and try to restrict it with a DENY
rule.
However, if the DENY
rule does not match any data, for example, if the property is null or misspelled, the DENY
rule will not apply, and the user can still access the property.
GRANT READ {salary} ON GRAPH * NODES Employee TO myRole
DENY READ {salary} ON GRAPH * FOR (e:Employee) WHERE e.position = 'CEO' TO myRole
In this case, if the e.position
property is null or misspelled, the DENY
rule will not apply, and myRole
will see the salary
property.
A better way is to apply the principle of least privilege and only grant access to the salary
property for employees whose position is not 'CEO'.
GRANT READ {salary} ON GRAPH * FOR (e:Employee) WHERE e.position <> 'CEO' TO myRole
Or, if for some reason using DENY
is unavoidable, the problem can be mitigated by adding an additional DENY
to cover the case where e.position
is null:
DENY READ {salary} ON GRAPH * FOR (e:Employee) WHERE e.position IS NULL TO myRole
This way, if e.position
is null, the user will not see the salary
property, and the DENY
will not apply.
Alternatively, you can add a constraint to ensure that the e.position
property cannot be null, so the DENY
condition is always checkable:
CREATE CONSTRAINT ON (e:Employee) ASSERT e.position IS NOT NULL;
This way, the DENY
will never apply due to null values, and the user will not see the salary
property for employees whose position is 'CEO'.
DENY
failing open with label-based RBACIn a similar way, a DENY
rule will not apply when it is too broad and does not match the data.
GRANT READ {salary} ON GRAPH * NODES * TO myRole;
This grants read access to the salary
property on all nodes, including those that should not be accessible.
Then, you try to restrict it with a DENY
rule to prevent access to the salary
property on nodes labeled Management
:
DENY READ {salary} ON GRAPH * NODES Management TO myRole;
In this case, if the Management
label is not present on a node that has the salary
property, the DENY
rule will not apply, and myRole
will still see the salary
property on that node.
A better way is to apply the principle of least privilege and only grant access to the salary
property for nodes that have a specific label, such as IndividualContributor
:
GRANT READ {salary} ON GRAPH * NODES IndividualContributor TO myRole;
This way, the user will only see the salary
property on nodes that have the IndividualContributor
label, and not on any other nodes.
Security and labels
Traversing the graph with multi-labeled nodes
In Neo4j, nodes can have multiple labels, but relationships only have one type. This is important when it comes to controlling who can see what.
The following section only focuses on nodes because they can have multiple labels. The same general rules apply to relationships, but they are simpler.
For details on the general influence of access control privileges on graph traversal, see Graph and sub-graph access control.
If a user is granted access to a traversable node using GRANT TRAVERSE
or GRANT MATCH
, they will be able to get information about the attached labels by calling the built-in labels()
function.
In the case of nodes with multiple labels, this means that the user will be able to see all labels attached to the node, even if they were not granted access to traverse on some of those labels.
For example, if a user has the following role:
GRANT TRAVERSE ON GRAPH * NODES A TO custom
And the graph contains three nodes: one labeled :A
, another labeled :B
, and one with both labels :A
and :B
.
If the user executes the following query:
MATCH (n:A)
RETURN n, labels(n)
They will get a result with two nodes: the node with label :A
and the node with labels :A :B
.
In contrast, if the user executes:
MATCH (n:B)
RETURN n, labels(n)
They will get only the node that has both labels: :A
and :B
.
Even though :B
does not have access to traversals, there is one node with that label accessible in the dataset due to the allow-listed label :A
that is attached to the same node.
If a user is denied to traverse on a label, they will never get results from any node that has this label attached to it. Thus, the label name will never show up for them. For example, if the user has the following role:
DENY TRAVERSE ON GRAPH * NODES B TO custom
And the graph contains the same three nodes as before, the user will not be able to traverse the node with label :B
.
Thus, the query
MATCH (n:A)
RETURN n, labels(n)
will now return the node only labeled with :A
, while the query
MATCH (n:B)
RETURN n, labels(n)
will now return no nodes.
The db.labels() procedure
In contrast to the normal graph traversal described in the previous section, the built-in db.labels()
procedure is not processing the data graph itself, but the security rules defined on the system graph.
That means:
-
If a label is explicitly whitelisted (granted), it will be returned by this procedure.
-
If a label is denied or is not explicitly allowed, it will not be returned by this procedure.
For example, if a user has the following role:
GRANT TRAVERSE ON GRAPH * NODES A TO custom
and the graph contains three nodes: one labeled :A
, another labeled :B
, and one with both labels :A
and :B
,
the user will be able to execute the following query:
CALL db.labels()
This will return a list of labels, which in this case will only include the label :A
.
The label :B
will not be returned, because the user does not have access to traverse on it.
Privileges for non-existing labels, relationship types, and property names
Privileges for non-existent labels, relationship types, and property names have an effect only once the latter are created. In other words, when authorizing a user, only privileges for existing labels, relationship types, and property names are applied. This is because the graph elements must be resolved internally to be able to check against the privileges when users try to use them later. If a label, relationship type, or property name does not yet exist, it will not resolve, and therefore, the privileges will not apply.
A way around this is to create the label, relationship type, or property name using the db.createLabel()
, db.createRelationshipType()
, and db.createProperty()
procedures on the relevant database when creating the privileges.
Labels, relationship types, and property names are considered non-existent in a database if:
-
There has never been a node with that label, a relationship with that relationship type, or a property with that name.
-
There has been no attempt to add a node with that label, a relationship with that relationship type, or a property with that name.
The attempted creation adds it to the known labels, relationship types, and property names even if the creation itself fails (unless it fails on missing or denied privileges to create new labels, relationship types, or property names). -
They have not been created using any of the
db.createLabel()
,db.createRelationshipType()
, ordb.createProperty()
procedures.
There is currently no way to remove a label, relationship type, or property name from the database. Once existent in the database, they cannot return to non-existent.
For example, let’s assume that you have a new, freshly-created empty database, called testing
, and a user named Alice
with a custom
role.
The example focuses only on nodes and their labels, though the same principle applies to relationships and their relationship type, and properties (on both nodes and relationships) and their names. |
Using the following command, you define some privileges to the custom
role:
GRANT MATCH {*} ON GRAPH testing NODES * TO custom
GRANT CREATE ON GRAPH testing NODES `A` TO custom
GRANT SET LABEL `A` ON GRAPH testing TO custom
GRANT CREATE NEW NODE LABEL ON DATABASE testing TO custom
This means that when Alice
executes:
CREATE (:`A`)
She will get the following exception even though she is allowed to create new labels:
Create node with labels 'A' on database 'testing' is not allowed for user 'Alice' with roles [PUBLIC, custom].
However, rerunning the same query will create the node. This is because the failed creation still creates the label, making it no longer non-existent when the query is run a second time.
To ensure success on the first attempt, when setting up the privileges for the custom
role, the administrator should run the db.createLabel()
procedure on the affected databases for all non-existing labels that get assigned privileges.
In this example, when creating the custom role, connect to testing
and run CALL db.createLabel('A')
to ensure Alice creates the node successfully on her first attempt.
Security and performance
Security rules and database operations
The rules of a security model may impact the performance of some database operations, because Neo4j has to do extra security checks, which require additional data access. For example, count store operations, which are usually fast lookups, may experience notable differences in performance.
Let’s take the following example.
The database has two roles defined restricted
and unrestricted
.
The restricted
role has limited access to traversals, while the unrestricted
role has no restrictions.
GRANT TRAVERSE ON GRAPH * NODES Person TO restricted;
DENY TRAVERSE ON GRAPH * NODES Customer TO restricted;
GRANT TRAVERSE ON GRAPH * ELEMENTS * TO unrestricted;
Now, let’s look at what the database needs to do in order to execute the following query:
MATCH (n:Person)
RETURN count(n)
For both roles, the execution plan looks like this:
+--------------------------+ | Operator | +--------------------------+ | +ProduceResults | | | + | +NodeCountFromCountStore | +--------------------------+
Internally, however, very different operations need to be executed. The following table illustrates the difference:
User with unrestricted role |
User with restricted role |
---|---|
The database can access the count store and retrieve the total number of nodes with the label This is a very quick operation. |
The database cannot access the count store because it must make sure that only traversable nodes with the desired label So due to the additional data access required by the security checks, this operation will be slower compared to executing the query as an unrestricted user. |
Security rules based on property rules and performance
Extra node or relationship-level security checks are necessary when adding security rules based on property rules, and these can have a significant performance impact.
The following example shows how the database behaves when adding security rules for nodes to roles restricted
and unrestricted
.
The same limitations apply to relationships.
GRANT TRAVERSE ON GRAPH * FOR (n:Customer) WHERE n.secret <> true TO restricted;
GRANT TRAVERSE ON GRAPH * ELEMENTS * TO unrestricted;
When executing query:
MATCH (n:Customer)
RETURN n
For both roles, the execution plan looks like this:
+--------------------------+ | Operator | +--------------------------+ | +ProduceResults | | | + | +AllNodesScan | +--------------------------+
Internally, however, very different operations need to be executed. The following table illustrates the difference:
User with unrestricted role |
User with restricted role |
---|---|
The database will scan all nodes and quickly identify accessible nodes based solely on the presence of the |
The database will scan all nodes, identify potentially accessible nodes based on the presence of the specified label, and then also access the properties of each of those nodes and inspect their values to ensure the property rule criteria are met (i.e., that |