Phase one required turning qualitative data into quantitative data. We started by recording all of the pertinent data from the PDF version of the Leven and Melville Papers. We then turned this data into spreadsheet data, here we used categories: Id, Sender, Receiver, Location from, Location to, Latitude and Longitude, Type and Date. This allowed us to parse the networking data in the form of nodes and edges. The data was then cleaned using OpenRefine to split up latitude and longitude and keywords into different columns; we also made sure there were no blank tiles and duplicates. After creating a master spreadsheet information file, we then set about creating different sheets for different visualizations including people, places, keywords, nodes, and then edges (relationships). We were able to get all the data from the 599 letters contained in the digitized copy of the Leven and Melville papers into a csv file. Given the large dataset of Network Letters, it allowed for exploratory data analysis and investigation on different digital tools to identify the best representation of the relationships presented in the papers. One of the main tools we ended up using was the programming language Python; which contains a large number of libraries that extend the capabilities of the language, allowing for complex visualizations of the Network Data. One of the most prominent libraries used was networkx, which allowed the creation of network graphs along with the application of the Girvan-Newman algorithm to detect communities within the network. This algorithm works by repeatedly removing edges on the shortest path within the network. Additionally, the nodes are given corresponding colors to highlight their community, enabling an easier identification of groups in the network. The algorithm is important for understanding the network graph because a node with higher betweenness centrality would have more control over the network, due to the fact that more information will pass through that node. The implementation of these libraries and creation of visuals were carried out on Jupyter notebook, which is an open-source software for interactive computing. In addition, we experimented with tools such as Leaflet, Flourish, and Gephi for further analysis of the letters.
Most Connected Nodes
-
-
Degree Centrality
Our exploration began with a degree centrality analysis of the letter corpus. This is a calculation of the number of edges or connections of a singular node. Degree centrality is the easiest calculation of the significance of the node on the network i.e. where it is structurally important and often reveals clusters that dominate the network. Unsurprisingly Melville has the largest centrality count of 499, he is followed by Crawford with a score of 87, Hamilton with a score of 56, and John Dalrymple of Stair with 46.
Geography and place
Despite the fact that William never set foot in Scotland, the considerable two-way flow of correspondence between Edinburgh and London, combined with the numerous journeys made by officers of state, meant that he had a great deal of information about the situation in Scotland and was able to relay instructions to his Scottish counterparts. The high traffic of post and letters referring to the situation in Scotland suggests that the secretaries stayed in frequent contact with their counterparts in Edinburgh. One of the things that is evident from the graph produced by the Leaflet libraries is that the network expanded more than just Scotland and England, letters in Melville's corpus ranged from Dublin and Ballyhara in Ireland to Brussells and Gerpines on the continent. The largest clusters were in Edinburgh and London which is unsurprising since William's most important ministers resided in the respective captials and centers of governance.