If I understand you correctly I think your expectation is wrong. I think you are expecting the image to appear the same through the ACES sRGB ODT as the EXR would look with just an sRGB curve as a view LUT. But a simple sRGB VLUT maps linear 1.0 to peak output, leaving no space for highlights >1. These just get clipped. ACES includes a tone curve which rolls off the highlights, and in order to “make space” for those the mid tones need to be dropped down a little.
You can use an inverse output transform as an IDT to achieve what you want, but you need to consider if that is really appropriate for your situation. Inverse output transforms create scene linear values which are not always physically plausible (sRGB white effectively becomes a scene light source) and may behave in unexpected ways in compositing.
You can see in the screenshot below that while the inverse output transform achieves a match to the simple sRGB curve, to do so the white patch gets a scene luminance of ~3.4.