The vital (and often confusing) idea is that there is one virtual world, but any number of coordinate frames might be used at any moment. Often a frame is used for each object and each viewpoint. This is something you already do all the time with the real world. The hard part is seeing how it works with a virtual world.
For example, say you have a vase set upon a table. You could describe its location as "8 inches from the front edge of the table and 12 inches from the right edge."
Or, you could pick some other frame and say "17 inches from the left edge of the table and 38 inches from the back edge." Or you could describe where it is in relation to the foot of the left front leg of the table, or....