You're using Xerte 3.10.5 and the interactive video page right? Actually my advice would be more or less the same for Media Lesson too...
Watch your video and make a note of each of the times when you want the interactions to happen. You'll need to add these in seconds so 1 min 30seconds in you'll add as 90. You can scrub your video to get to just before each point you don't have to watch it in realtime.
Add each of the items (text, mcq, xot) and in the Synch point field add the relevant time. It's best to make sure later items are later items in the tree with later sync points.
In the Choose Location box draw a rectangle to set where that content will appear.
If you want that content to disappear before the next item appears set an optional synch point end - also in seconds.
If you want the item to first appear as an icon add the optional event and configure that as you wish.
The other options like pause media, answer type, disable controls etc should be self explanatory.
To a certain extent you need to try it and see - changing options to see what they do.