Skip to content

feat: adds condition class and assoc. unit tests #2159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

chalmerlowe
Copy link
Collaborator

Adds a Condition class and associated suite of tests.

🦕

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery API. labels Apr 10, 2025
@chalmerlowe chalmerlowe marked this pull request as ready for review April 11, 2025 11:45
@chalmerlowe chalmerlowe requested review from a team as code owners April 11, 2025 11:45
@chalmerlowe chalmerlowe requested a review from whuffman36 April 11, 2025 11:45
"""str: The expression string for the condition."""

# Cast assumes expression is always set due to __init__ validation
return typing.cast(str, self._properties.get("expression"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my education, why is typing.cast() necessary for expression, but not title or description?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both mypy and pytype struggle with correctly assessing return types. In this case, both checkers think the return type should be Optional[Any] when we expect (and validate for) a str.

Despite the fact that we indicate via typehinting:

  • that expression is a str in the __init__() method
  • the expression getter returns a str
  • the expression setter method signature requires that value be input as a str
  • the setter method has internal checks to ensure that the value stored in the _properties dict is a str

neither mypy nor pytype will believe it.

It is not inherently clear why they believe our typehinting for title and description but not expression. I have come across this failure multiple times in adding objects to our repos.

There are approximately seven similar examples of this problem elsewhere in this file (that predate my taking over the repo) (and there are examples in other files in this codebase).

These are the error messages we see:

mypy

google/cloud/bigquery/dataset.py:1140: error: Incompatible return value type (got "Optional[Any]", expected "str")  [return-value]

pytype

/repo/python-bigquery/google/cloud/bigquery/dataset.py:1140:1: error: in expression: bad option 'None' in return type [bad-return-type]
           Expected: str
  Actually returned: Optional[Any]

condition = Condition(expression=self.EXPRESSION)
expected_api_repr = {
"expression": self.EXPRESSION,
"title": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with Condition, do we need to distinguish between title and description being not set versus being an empty string?

Copy link
Collaborator Author

@chalmerlowe chalmerlowe Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a pair of asserts to test_setters to confirm the use case that assigning empty strings is allowed and succeeds.

I think there are two issues here:

  1. this test is ensuring that if I try to create a Condition object using ONLY the required expression argument and leave the other arguments (title, description) blank, will it create the expected outcome: i.e. a create an object with None values assigned to both title and description?

  2. Do we need to test how the setter for title and description handle a range of values (None, empty string, and non-empty string, something besides the above, etc)

  • In test_setters we confirm that we can reassign a new string to either title or description AND we confirm that we can assign a new value of None (WAI)
  • In test_constructor_and_getters_full we confirm that we can set a non-empty string (WAI)
  • In test_validation_setters we confirm that something besides a string will fail (WAI)

Our logic does not hinge on whether a value assigned to title or description is Falsey OR not. We do not do a boolean check NOR do empty strings trigger some logic so there is no difference between assigning an empty string vs a non-empty string. I am not convinced that we need the test for empty strings, but I don't believe it will hurt us.


def to_api_repr(self) -> Dict[str, Any]:
"""Construct the API resource representation of this Condition."""
return self._properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to make a deep copy of the dict, similar to other classes, such as Table.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we have a compelling reason to use deepcopy, I am disinclined. Thoughts?

In a previous experimental PR, Tim left a comment about some of the pitfalls of using deep copy, if not necessary.

It might be a bit wasteful to make deepcopy here and in `from_api_repr`. Indeed it's safer, but could add a lot of overhead. IIRC we actually removed some `deepcopy` calls from `SchemaField` because it was slowing down customers who build dynamic schemas in their code.

See: #6 and #26

In all cases we are returning a dict (thus there should not be a significant risk of some underlying nested value being changed by another expression in the code)

I was curious about the overall cost of a deepcopy so I did a fairly simple experiment:

In [1]: import copy
In [2]: api_repr = {
   ...:      "expression": "some_value",
   ...:      "title": "some_value",
   ...:      "unexpected_field": "some_value",
   ...: }
In [3]: _properties['condition'] = api_repr

From fastest to slowest:

Simply assign an alias to the dict

In [4]: %timeit new_prop = _properties
14.4 ns ± 0.0149 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

# NOTE: when we use `return _properties`, it is the same as using an alias.

Use the builtin copy method in the dict class

This creates a shallow copy of the dict

In [11]: %timeit new_prop = _properties.copy()
58.6 ns ± 0.228 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Create a copy.copy

This also creates a shallow copy of the dict

In [9]: %timeit new_prop = copy.copy(_properties)
145 ns ± 0.401 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Create a copy.deepcopy

In [5]: %timeit new_prop = copy.deepcopy(_properties)
3 µs ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

@chalmerlowe chalmerlowe merged commit a69d6b7 into main Apr 16, 2025
18 checks passed
@chalmerlowe chalmerlowe deleted the feat-b330869964-add-dataset-condition-access-policy-version branch April 16, 2025 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants