Abstract
Dominance structure of animal societies is likely to be more complex than simple linearity, but few, if any, methods exist for quantifying nonlinear structure. Our R package “Perc” presented a new network-based method, called Percolation and Conductance, which permits nonlinear structure in dominance interactions. It uses information from both direct and indirect dominance pathways to calculate consistency in the direction of transitive dominance pathways from A to B (e.g. via pathway through C, D, and/or E). Greater consistency results in higher certainty that A outranks B. It creates a matrix of probabilities that the row individual outranks the column individual. It performs simulated annealing processes to explore possible rank orders and finds the best possible rank order. By applying simple R functions, researchers will be able to quantify rank order and dominance uncertainty with the help of the package.
Rank relationships amongst members of a social group are most commonly represented as an ordered list of individuals that are ranked from highest to lowest. Numerous methods exist to identify this rank order, and most methods fall under one of two approaches: (1) finding the appropriate ranking by reordering the rows and columns of a win/loss matrix or (2) calculating a cardinal dominance index which ranks individuals by the proportion of others dominated, both of which are based upon a data set of agonistic and/or submissive interactions. Such methods produce a linear rank order, whether or not the society is expected to have a linear hierarchy or whether the data fit the assumption of linearity (which is particularly problematic when calculating cardinal ranks (Shev et al 2012)). Most researchers agree that the dominance structure of animal societies is likely to be more complex than simple linearity, but few, if any, methods exist for quantifying nonlinear structure. We present a new network-based method, called Percolation and Conductance, that permits nonlinear structure to emerge via estimates of network-wide directional consistency in the flow of dominance interactions and detection of blocks of dominance ambiguity that are indicative of nonlinear segments of a hierarchy. An additional benefit of this method is the ability to quantify dyadic-level dominance certainty. Dyadic dominance potentials are calculated using multiple indirect dominance pathways in the network (via common third parties) to infer missing data in the win/loss matrix and enhance the calculation of dominance potentials from direct interactions. Greater consistency in the direction of transitive dominance pathways from A to B (e.g. via pathway through C, D, and/or E) result in higher certainty that A outranks B, whereas greater evidence of inconsistent direction (e.g. some directed pathways go from A to B whereas other directed pathways go from B to A) result in dominance ambiguity. Because the number of indirect pathways is typically exponentially larger than direct win/loss interactions, one can be relatively confident that dyads with ambiguous dominance potentials are truly ambiguous due to evidence of directional inconsistency in their network pathways rather than lack of data.
Rank order using the percolation-conductance method is based on a matrix that combines information from direct win/loss interactions with information from indirect pathways as described above to create a matrix of probabilities that the row individual outranks the column individual. Percolation-conductance finds these multi-step, directed pathways between all pairs of nodes by performing a series of random walks through the network. A simulated annealing process explores possible rank orders (from the space of all possible rank orders) – the more simulated annealing runs that are performed, a larger number of possible rank orders are explored, meaning that the best possible rank order is more likely to be found. The starting location of simulated annealing is determined by the total wins per individual in the matrix. Multiple simulated annealing steps are required to arrive at the optimal rank order, keeping in mind that multiple rank orders may be equally good if there are nonlinear segments in the hierarchy. The heat map function (plot.conf.mat) will plot the ranking and can allow for the detection of non-linear dominance structures. A block of individuals with probabilities near 0.5 indicates a subgroup of individuals whose relationships are not clearly defined. A second metric yielded by this method, dominance uncertainty, reflects the directional consistency of information flow through the direct and indirect pathways in a directed and weighted network.
The Percolation-Conductance method of quantifying rank and dominance uncertainty is computation complicated and requires researchers to have a sophisticated computational background to process the data. Our group, in collaboration with collaboraters in UC Davis Statistics Deparment, developed an R package that implements the percolation-conductance algorithm on directed dyadic interactions. By applying simple R functions, researchers will be able to quantify rank order and dominance uncertainty with the help of the package. The goal of this tutorial is to show how to quantify rank and dominance uncertainty from win-loss raw data.
The function as.conflictmat
is used to import raw edgelists or matrices for further analysis. Raw edgelists or matrices are records for directed dyadic win-loss interactions. The prefered R data types for as.conflictmat
are “matrix
” and “data.frame
”. Raw edgelists or matrices can be imported as R data type of either data.frame
or matrix
to be used in as.conflictmat
.
The most simple and commonly used raw data is a edgelist of two columns consisting all dyadic win-loss interactions, with the first columns being the winner and the second column being the loser. Duplicates are allowed in this two-column edgelist because multiple interactions between a pair are represented by multiple rows of the same winner and loser. Below is an example of a two-column edgelist.
library(Perc)
# displaying the first 5 rows of the example data.
head(sampleEdgelist, 5)
## Iname Rname
## 1 Kale Kibitz
## 2 Kale Kibitz
## 3 Kale Kibitz
## 4 Kale Kibitz
## 5 Kale Kolyma
A weighted edgelist is also allowed as raw edgelist data. It is a “matrix
” or “data.frame
” of three columns, with the first two columns being winner IDs and loser IDs respectively and the third column being the frequency of the win-loss interaction between the winner and the loser. An example of a weighted edgelsit is shown below:
# displaying the first 5 rows of the example data.
head(sampleWeightedEdgelist, 5)
## Initiator1 Recipient1 Freq
## 1 Angelina Alexis 1
## 2 Angelina Darcy 1
## 3 Angelina FirstTimeMom 1
## 4 Angelina Mak 1
## 5 Angelina Tabitha 3
In addition, a matrix representing dyadic win-loss interaction is also accepted as raw data. For example,
# displaying the first 5 rows and columns
1:5, 1:5] sampleRawMatrix[
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
The function as.conflictmat
is used to import raw data of any accepted format, transform raw data to a “conflict matrix
” for further analysis. “conflict matrix
” is very specifically refered to a class in R. If your raw data is an edgelist of either two- or three- column, it will transform the your raw edgelist to a matrix in which counts representing for how many times a row wins over a column. Running this on a win-loss matrix creates an identical conflict matrix. This is required for all data formats to create R object of class conflict matrix.
Below is an example on transforming a two-column edgelist:
# convert an two-column edgelist to conflict matrix
<- as.conflictmat(sampleEdgelist)
confmatrix
# displaying the first 5 rows and columns of the converted matrix
1:5, 1:5] confmatrix[
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0 0 0 0 0
## Kale 0 0 0 0 4
## Kalleigh 0 1 0 0 0
## Keira 0 0 0 0 0
## Kibitz 0 6 0 0 0
Below is an example on transforming a raw win-loss matrix:
# convert a win-loss matrix to conflict matrix
<- as.conflictmat(sampleRawMatrix)
confmatrix2
# displaying the first 5 rows and columns of the converted matrix
1:5, 1:5] confmatrix2[
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
When importing weighted edgelist (three-column), you’ll need to specify weighted = TRUE
if your raw data is a weighted edgelist. Below is an example:
<- as.conflictmat(sampleWeightedEdgelist, weighted = TRUE) confmatrix3
Directionality is very importnat when using the function as.conflictmat
to import raw data. By default, winners (sources) are in the first column and loses (targets) are in the second column for edgelists. Row names are assumed to be the winner for a raw win-loss matrix. Values in a raw win-loss matrix describe that for how many times the corresponding row ID win over the corresponding column ID. If the directionality in your raw data happens to be opposite from the default, you could specify swap.order = TRUE
in as.conflictmat
. It will tell R that the directionality in your raw data is the opposite of default setting. Running as.conflictmat
on your raw data with swap.order = TRUE
will automatically correct the directionality when transforming your raw data to a conflict matrix
. Here is an example,
# displaying the first 5 rows and columns of the raw win-loss matrix
1:5, 1:5] sampleRawMatrix[
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
# use swap.order = TRUE when running as.conflictmat.
<- as.conflictmat(sampleRawMatrix, swap.order = TRUE)
confmatrix4
# displaying the first 5 rows and columns of the converted matrix
1:5, 1:5] confmatrix4[
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 1 0 6 0
## Angelina 0 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 0 0 0 0 0
## C19 0 0 0 0 0
Before computing the dominance probability and finding the best rank order, you want to know whether or not long pathways are present in your data, and if present, whether or not these long pathways can be useful for gathering additional information about rank relationships.
You could find pathways of particular length that starting at a particular node (ID). For example,
# Testing findIDpaths.R
<- findIDpaths(confmatrix, "Kuai", len = 3)
pathKuai # Displaying the first 5 rows of pathKuai
1:5, ] pathKuai[
## [,1] [,2] [,3] [,4]
## [1,] "Kuai" "Kalleigh" "Kale" "Kibitz"
## [2,] "Kuai" "Kalleigh" "Kale" "Kolyma"
## [3,] "Kuai" "Kalleigh" "Kimora" "Kibitz"
## [4,] "Kuai" "Kalleigh" "Kimora" "Kioga"
## [5,] "Kuai" "Kalleigh" "Kimora" "Koppy"
# When there's no pathway starting at a particular individual, you'll get an output like the example below:
<- findIDpaths(confmatrix, ID = "Kalani", len = 2) pathKalani
## no pathways of length 2 found starting at Kalani
You could also use transitivity
to find out information on transitivity, a measure of how consistent or inconsistent the flow of information is through the network as a whole. This procedure looks for the number of triangles (e.g. A→B→C→A) and determines whether or not they are transitive.
<- transitivity(confmatrix)
conftrans $transitive # number of transitive triangles conftrans
## [1] 14
$intransitive # number of intransitive triangles conftrans
## [1] 4
$transitivity # transitivity conftrans
## [1] 0.7777778
$alpha # alpha conftrans
## [1] 6.468627
The function transitivity
allow estimation for alpha
, which is a value used in the conductance procedure to weight the information from the indirect pathways. The higher the transitivity of a network (the higher the alpha), the greater the weight the conductance procedure will give the indirect pathways.
After transforming your raw data in an R object of conflict matrix
class, we will use the function conductance
to find all indirect pathways of a particular length (defined by maxLength
) and update the conflict matrix. The maximum length of indirect pathways should be decided based on the data. We use a max length of 4 for our dominance rank. This yields the dominance probability matrix which represents the probability the row outranks the column. Values in the matrix are from 0 - 1, with 0.5 being the most uncertain. If a value is less than 0.5, the corresponding row ID is more likely to lose against the corresponding column ID; if a value is greater than 0.5, the corresponding row ID is more likely to win over the corresponding column ID.
<- conductance(confmatrix, maxLength = 2) DominanceProbability
DominanceProbability
is a list of 2 elements. The first element, named imputed.conf
is the updated conflict matrix which incorporates information from indirect pathways from the original conflict matrix. Comparing the original conflict matrix with the updated conflict matrix will show the information that is gained through the indirect pathways:
# displaying the first 5 rows and columns of the original conflict matrix
1:5, 1:5] confmatrix[
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0 0 0 0 0
## Kale 0 0 0 0 4
## Kalleigh 0 1 0 0 0
## Keira 0 0 0 0 0
## Kibitz 0 6 0 0 0
# displaying the first 5 rows and columns of the imputed conflict matrix
$imputed.conf[1:5, 1:5] DominanceProbability
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.00000000 0.000000 0 0.00000000 0.0000000
## Kale 0.00000000 0.000000 0 0.00000000 4.0557534
## Kalleigh 0.05575338 1.000000 0 0.11150676 0.2230135
## Keira 0.00000000 0.000000 0 0.00000000 0.0000000
## Kibitz 0.00000000 6.055753 0 0.05575338 0.0000000
Examining information gained through indirect pathways will provide you information to decide what is the appropriate maxLength
for your data. You could examine the information gained through indirect pathways by substracting the original conflict matrix from imputed conflict matrix. The information gained could visually examined by generating a corresponding heatmap using the function plot.conf.mat
.
# substracting the original conflict matrix from imputed conflict matrix.
<- DominanceProbability$imputed.conf - confmatrix
informationGain
# examine the first five rows and columns of the informationGain
1:5, 1:5] informationGain[
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.00000000 0.00000000 0 0.00000000 0.00000000
## Kale 0.00000000 0.00000000 0 0.00000000 0.05575338
## Kalleigh 0.05575338 0.00000000 0 0.11150676 0.22301352
## Keira 0.00000000 0.00000000 0 0.00000000 0.00000000
## Kibitz 0.00000000 0.05575338 0 0.05575338 0.00000000
# generating a heatmap representing information gained by using informatio from indirect pathways.
plotConfmat(informationGain, ordering = NA, labels = TRUE)
As we could see from the heatmap, using maxLength = 2
does provide us more information than the original raw conflict matrix that we imported from the raw data. In real analysis, you may want to consider using maxLength = 3
or even more and repeat this diagnosis process to examine the information gain. This will allow you to have a better idea on what is the appropriate maxLength
for your data.
For example, if you want to examine whether or not maxLength = 3
provide more information than maxLength = 2
. Here is an example:
<- conductance(confmatrix, maxLength = 3)
DominanceProbabilityLength3 <- DominanceProbabilityLength3$imputed.conf - DominanceProbability$imputed.conf
informationGain2 plotConfmat(informationGain2, ordering = NA, labels = TRUE)
As we can see in this heatmap that using maxLength = 3
still gives you some more information. In real analysis, you may want to repeat the analysis and try maxLength = 4
. In addition, you may also want to consider your alpha value to decide how much you want to trust those long pathways.
The second element of DominanceProbability
the dominance probability matrix, named p.hat
. Here’s how you could pull out the dominance probability matrix:
# displaying the first 5 rows and columns of the dominance probability matrix
$p.hat[1:5, 1:5] DominanceProbability
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.0000000 0.5000000 0.4236125 0.5000000 0.5000000
## Kale 0.5000000 0.0000000 0.1180829 0.5000000 0.4040371
## Kalleigh 0.5763875 0.8819171 0.0000000 0.6325280 0.7095211
## Keira 0.5000000 0.5000000 0.3674720 0.0000000 0.4236125
## Kibitz 0.5000000 0.5959629 0.2904789 0.5763875 0.0000000
When you want to use dominance probability matrix for further analysis, often a long-format representation of the matrix is easier. The function dyadicLongConverter
convert win-loss uncertainty to long format.
# displaying the first 5 rows of the converted long format win-loss probability.
dyadicLongConverter(DominanceProbability$p.hat)[1:5, ]
## ID1 ID2 ID1 Win Probability ID2 Win Probability RankingCertainty
## 12 Kalani Kale 0.5000000 0.5000000 0.5000000
## 23 Kalani Kalleigh 0.4236125 0.5763875 0.5763875
## 24 Kale Kalleigh 0.1180829 0.8819171 0.8819171
## 34 Kalani Keira 0.5000000 0.5000000 0.5000000
## 35 Kale Keira 0.5000000 0.5000000 0.5000000
Simulated rank order is computed using a simulated annealing algorithm by running the function simRankOrder
on DominanceProbability$p.hat
. The argument num
is the number of simulated annealing runs to generate best rank order. The argument kmax
tells you how long (perhaps iterations) the simulated annealing function looks around for a local optimum to find the best rank order. By default, kmax = 1000
was used. But it takes a long time to run when using kmax = 1000
. The example below uses kmax = 10 just because it runs quickly and it is enough to illustrate the process, however this value is not recommended when processing real data. Here is how to use simRankOrder
to find the simulated rank order:
# find simRankOrder
<- simRankOrder(DominanceProbability$p.hat, num = 10, kmax = 10) s.rank
The function s.rank
returns a list containing simulated rank order and Costs for each simulated annealing run. The best rank order will be the one with the lowest cost.
# displaying the first 5 rows of the simulated rank order
$BestSimulatedRankOrder[1:5, ] s.rank
## ID ranking
## 1 Koppy 1
## 2 Kioga 2
## 3 Kuai 3
## 4 Kalleigh 4
## 5 Kyushu 5
# displaying cost for each simulated annealing run
$Costs s.rank
## simAnnealRun Cost
## 1 1 2.392065
## 2 2 2.321503
## 3 3 2.392065
## 4 4 2.392065
## 5 5 2.321503
## 6 6 2.321503
## 7 7 2.321503
## 8 8 2.321503
## 9 9 2.321503
## 10 10 2.392065
# rank orders generated by each simulated annealing run
$AllSimulatedRankOrder s.rank
## SimAnneal1 SimAnneal2 SimAnneal3 SimAnneal4 SimAnneal5 SimAnneal6 SimAnneal7
## 1 Koppy Koppy Koppy Koppy Koppy Koppy Koppy
## 2 Kioga Kioga Kioga Kioga Kioga Kioga Kioga
## 3 Kalleigh Kuai Kalleigh Kalleigh Kuai Kuai Kuai
## 4 Kyushu Kalleigh Kyushu Kyushu Kalleigh Kalleigh Kalleigh
## 5 Kimora Kyushu Kimora Kimora Kyushu Kyushu Kyushu
## 6 Kuai Kimora Kuai Kuai Kimora Kimora Kimora
## 7 Kibitz Kalani Kalani Kibitz Kibitz Kalani Kalani
## 8 Kalani Kibitz Kibitz Kalani Keira Kibitz Kibitz
## 9 Kolyma Kolyma Kolyma Keira Kalani Kolyma Kolyma
## 10 Kale Kale Kale Kolyma Kolyma Keira Kale
## 11 Keira Keira Keira Kale Kale Kale Keira
## SimAnneal8 SimAnneal9 SimAnneal10
## 1 Koppy Koppy Koppy
## 2 Kioga Kioga Kioga
## 3 Kuai Kuai Kalleigh
## 4 Kalleigh Kalleigh Kyushu
## 5 Kyushu Kyushu Kimora
## 6 Kimora Kimora Kuai
## 7 Kibitz Kalani Kibitz
## 8 Keira Kibitz Keira
## 9 Kolyma Keira Kalani
## 10 Kale Kolyma Kolyma
## 11 Kalani Kale Kale
After finding the simulated rank order, you can apply the rank order to your dominance probability matrix in order to generate a heatmap with its rows and columns ordered by rank. This heatmap will highlight areas of non-linear dominance rank (i.e. areas that are brown in the example below). A block of individuals with probabilities near 0.5 indicates a subgroup of individuals whose relationships are not clearly defined.
plotConfmat(DominanceProbability$p.hat, ordering = s.rank[[1]]$ID, labels = TRUE)
Besides ranking information, the undirected certainty information in dominance relationship is also important. The values in the dominance probability matrix range from 0-1 indicating both the direction of the relationship and the certainty of the relationship. The function valueConverter
converts all values to values between 0.5 - 1.0, which, in essence, removes directionality from the relationship leaving a single metric of dominance certainty (0.5 indicates total uncertainty and 1.0 indicates total certainty). For example,
# displaying the first 5 rows and columns of the converted matrix
valueConverter(DominanceProbability$p.hat)[1:5, 1:5]
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 1.0000000 0.5000000 0.5763875 0.5000000 0.5000000
## Kale 0.5000000 1.0000000 0.8819171 0.5000000 0.5959629
## Kalleigh 0.5763875 0.8819171 1.0000000 0.6325280 0.7095211
## Keira 0.5000000 0.5000000 0.6325280 1.0000000 0.5763875
## Kibitz 0.5000000 0.5959629 0.7095211 0.5763875 1.0000000
When you want to understand dyadic level dominance uncertainty, you could get the information from dyadicLongConverter
. For example,
# displaying the first 5 rows of Ranking Certainty
dyadicLongConverter(DominanceProbability$p.hat)[1:5, c("ID1", "ID2", "RankingCertainty")]
## ID1 ID2 RankingCertainty
## 12 Kalani Kale 0.5000000
## 23 Kalani Kalleigh 0.5763875
## 24 Kale Kalleigh 0.8819171
## 34 Kalani Keira 0.5000000
## 35 Kale Keira 0.5000000
When individual-level win-loss relationship uncertainty is the focus of your question, the function individualDomProb
computes the mean and standard deviation for individual-level win-loss probability.
individualDomProb(DominanceProbability$p.hat)[1:10,]
## ID Mean SD
## 1 Kalani 0.6079515 0.1705907
## 2 Kale 0.6403923 0.1628646
## 3 Kalleigh 0.7350389 0.1343705
## 4 Keira 0.6606999 0.1782413
## 5 Kibitz 0.6974806 0.1677272
## 6 Kimora 0.7269768 0.1736934
## 7 Kioga 0.7037738 0.1644045
## 8 Kolyma 0.6072364 0.1559790
## 9 Koppy 0.7157361 0.1433639
## 10 Kuai 0.6873502 0.1703318