I want to create some hypothesis based tests based on a random dataframe. I try to create a df using the following function:
@st.composite
def create_hypothesis_df(draw):
num_rows = draw(st.integers(min_value=1, max_value=10)) # Adjust the number of rows as needed
data = [
(
draw(st.text(min_size=0, max_size=)),
'1750',
draw(st.datetimes()),
draw(st.datetimes()),
draw(st.floats(min_value=1, max_value=1000)),
draw(st.floats(min_value=1, max_value=1000)),
draw(st.floats(min_value=1, max_value=1000)),
draw(st.text(min_size=0, max_size=100)),
draw(st.text(min_size=0, max_size=100)),
) for _ in range(num_rows)
]
columns = ["col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9"]
return pd.DataFrame(data, columns=columns)
However, this always returns an df that has: "", 1750, 2001-01-01, 2001-01-01, 1.000, 1.000, etc.
So basically it uses just the minimum value.
I need not lazy values as I do some calculations in a transform function, these I want to test by doing something similar like:
assert (result_df['new_column'] < input_df['col5']).all()
Your example is not actually executable, due to the SyntaxError from
st.text(min_size=0, max_size=)
. When I fix that, I get varied examples as expected - tested withThat said, I'd personally reach for Hypothesis' native support for Pandas, which would look like:
The idioms look pretty similar in this case, but this version is much easier to extend to sparse data, specific column dtypes, or other constraints; and is usually faster as data size grows.