The 2nd alternative sounds plausible to me, the inline video is actually smaller than the full screen option of the builtin apps, so I think it might be a good reason to use the latter.
Is it not possible to intercept the touch events with an extra jquery event handler function, which makes a distinction between a scroll and a tap event on the embedded player?