-
Notifications
You must be signed in to change notification settings - Fork 316
feat: Update the AccessEntry class with a new condition attribute and unit tests #2163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
google/cloud/bigquery/dataset.py
Outdated
# The api_repr for an AccessEntry object is expected to be a dict with | ||
# only a few keys. Two keys that may be present are role and condition. | ||
# Any additional key is going to have one of ~eight different names: | ||
# userByEmail, groupByEmail, domain, dataset, specialGroup, view, | ||
# routine, iamMember |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this. Why are we leaving out some data?
Couldn't we do:
access_entry = cls()
access_entry._properties = resource.copy()
return access_entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this question. I am open to other approaches.
This approach is only slightly modified from the existing code. The existing code attempted to account for a peculiarity in how we assign values to the "possible" attributes that an AccessEntry
"needs to have" and "might have" based on the way we have defined it.
I tweaked the existing code to maintain backwards compatibility and avoid rewrites in things like existing unittests, where possible.
The api_repr
for an AccessEntry object is composed of the following keys (internal link):
- ROLE
- ONE OF THE FOLLOWING (and as I understand it, only one) key:value pairs (as listed in the proto above):
a. userByEmail: some_value
b. groupByEmail: some_value
c. domain: some_value
d. specialGroup: some_value
e. IamMember: some_value
f. view: some_value
g. routine: some_value
h. dataset: some_value - CONDITION
Each of the three items above is stored in the _properties
dict (in one form or another, see below).
Depending on how the AccessEntry
object gets configured, each of a
-h
above MAY also be associated with a corresponding attribute (ie access_entry.domain
, access_entry.user_by_email
) but this is not enforced.
Which element a
through h
will be provided as part of the api_repr
is an unknown.
Despite the presence of one of a
through h
, to instantiate the AccessEntry
class users would need to provide role
, entity_type
, and entity_id
or rely on the defaults. Thus since we received a key a
through h
there must be a way to translate the key to entity_type
and to translate from the value associated with that key to entity_id
.
Once the AccessEntry
object is created users also have the ability to populate other attributes that align on a one-to-one basis with item a
through h
by using a setter. (This appears to be included as a convenience feature, does NOT happen automagically during initialization, and is not a strict requirement).
If we receive an api_repr
resource we use the logic in .from_api_repr
to try and extract the a
through h
key (without which one we got) to assign it to entity_type
and then assign the value to entity_id
.
We first pop the "role" if it exists and we pop the "condition" if it exists and the only thing that should be left in the dict is the single element a
or b
or c
, etc. We have no expectation that there will be an additional key:value pair in the dictionary. Thus we do not expect to be dropping any values.
If my interpretation of how this works is wrong, please let me know.
An alternative approach could be to do a key lookup for any key in a set of eight keys, but that seems less resilient than the pop method (ie if a ninth key gets added to the API definition, the current PR still works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach is still quite brittle. If/when the backend adds additional properties, similar to how condition is being added now, the user will have no way of accessing those, as _properties
isn't being saved. This can especially be a problem when doing a round trip of downloading access entries, modifying them and re-uploading them. Losing a property like condition (or whatever new thing comes in future) can have security consequences.
For these reasons, I strongly encourage changing from_api_repr
to my proposal
access_entry = cls()
access_entry._properties = resource.copy()
return access_entry
and modify entity_type
to look at _properties
for keys other than condition
and role
. In the case of two or more remaining keys, we can perhaps use an allow list of the entity types we know about to account for some new condition
-like property in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tswast PTAL and approve, as appropriate.
I updated the _from_api_repr()
method.
And fixed a number of downstream elements that relied upon the output of the original version of that method.
google/cloud/bigquery/dataset.py
Outdated
return ( | ||
self.expression == other.expression | ||
and self.title == other.title | ||
and self.description == other.description | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could do the _key()
trick here too and in __hash__
to avoid a bit of duplication / potential errors if any new properties are added.
tests/unit/test_dataset.py
Outdated
exp_resource = { | ||
"role": None, | ||
"dataset": { | ||
"dataset": DatasetReference("my-project", "my_dataset"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems not great. What happens when we convert this resource to JSON and send it to the REST API? I would expect that json.dumps
or the equivalent would fail on such on object.
Are we missing something in the @dataset.setter
? Looks like we need a clause like
if isinstance(value, DatasetReference):
value = value.to_api_repr()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much Chalmer. This leaves the code base in much better shape than it started. 🏕️
if self.entity_type: | ||
entity_type = self.entity_type | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: This could use the "walrus operator" to save a call to the entity_type property.
if self.entity_type: | |
entity_type = self.entity_type | |
else: | |
if not (entity_type := self.entity_type): |
Updates the
AccessEntry
class with a newcondition
attribute and unit tests.Specifically:
This work is in support of internal bug: b/330869964