Session 3
Turning raw exports into reliable evidence.
This session matters because poor data hygiene undermines every downstream analysis, no matter how sophisticated the model. It follows the collection lab by turning raw exports into a consistent, defensible dataset. Students learn to normalize author, institution, and keyword fields while tracing provenance. They produce a cleaned dataset and data dictionary that specifies fields, assumptions, and transformations. Skills developed include deduplication strategies, metadata alignment, and reproducible documentation. This cleaned base enables reliable network construction and text analytics in subsequent labs.
Submission for this session
Previous session
Session 2: Publication and Patent Data Collection
Next session
Session 4: SNA Micro: Nodes, Centrality, Brokerage