In some examples, an augmented reality (AR) server extracts multiple instruction steps from a digital document and predicts a plurality of spatial identifiers associated with the plurality of instruction steps respectively using a prediction model, corresponding to a plurality of spatial objects respectively in a real-world environment. The AR server generates one or more heatmaps associated with the plurality of spatial objects based on user behavior data associated with the plurality of spatial objects. The AI server selects an anchoring location for each instructional step based on the one or more heatmaps to obtain a plurality of anchoring locations associated with the plurality of spatial objects. The AR server generates AR rendering data for the plurality of instruction steps to be displayed via an AR device at the plurality of anchoring locations. The AR server transmits the AR rendering data to the AR device.