Yes, you can calculate transform from coordinates of image bounding box and shape of the image.
There are only 4 non-constant parameters in transform matrix. Two of them are coordinates of upper left corner of the image, which you can get from the bounding box. The other two are resolution in x and y directions. Those you can calculate by dividing size of bounding box by number of pixels in each of x and y dimensions of your image.
Then I apply the mask to the downloaded array by multiplying both arrays.
However, I’m worried that the rasterization of the polygon done by rasterio might differ from the one done by Sentinel Hub when clipping the image to the requested BBox. This would mean the pixels are not well aligned, and I’m masking out the wrong pixels.