Thank you for your interest in the incidentally package! The incidentally package is designed to generate random incidence matrices and bipartite graphs under different constraints or using different generative models.
The incidentally
package can be cited as:
Neal, Z. P. (2022). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
For additional resources on the incidentally package, please see https://www.rbackbone.net/. If you have questions about the incidentally package or would like an incidentally hex sticker, please contact the maintainer Zachary Neal by email (zpneal@msu.edu) or via Mastodon (@zpneal@mastodon.social). Please report bugs in the backbone package at https://github.com/zpneal/incidentally/issues.
An incidence matrix is a binary \(r \times c\) matrix I that records associations between \(r\) objects represented by rows and \(c\) objects represented by columns. In this matrix, \(I_{ij} = 1\) if the ith row object is associated with the jth column object, and otherwise \(I_{ij} = 0\). An incidence matrix can be used to represent a bipartite, two-mode, or affiliation network/graph, in which the rows represent one type of node, and the columns represent another type of node (e.g., people who author papers, species living in habitats) (Latapy, Magnien, and Del Vecchio 2008). An incidence matrix can also represent a hypergraph, in which each column represents a hyperedge and identifies the nodes that it connects.
For example: \[I = \begin{bmatrix} 1 & 0 & 1 & 0 & 1\\ 0 & 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 1 & 0 \end{bmatrix} \] is a \(3 \times 5\) incidence matrix that represents the associations of the three row objects with the five column objects. If the rows represent people and the columns represent papers they wrote, then \(I_{1,1} = 1\) indicates that person 1 wrote paper 1, while \(I_{1,2} = 0\) indicates that person 1 did not write paper 2. One key property of an incidence matrix is its marginals, or when the matrix represents a bipartite graph, its degree sequences. In this example, the row marginals are \(R = \{3,4,2\}\), and the column marginals are \(C = \{1,2,2,2,2\}\). This incidence matrix can also be represented as a bipartite graph, where the row nodes are labeled with uppercase letters, and the column nodes are labeled with lowercase letters:
The incidentally package can be loaded in the usual way:
set.seed(5)
library(incidentally)
Upon successful loading, a startup message will display that shows
the version number, citation, ways to get help, and ways to contact me.
Here, we also set.seed(5)
to ensure that the examples below
are reproducible.
The incidentally package offers multiple incidence matrix-generating functions that differ in how the resulting incidence matrix is constrained. These functions are described in detail below, but briefly:
incidence.from.probability()
, incidence.from.vector()
, and incidence.from.distribution()
generate incidence matrices and bipartite graphs with given constraints
on its fill/density and marginals/degrees.
incidence.from.adjacency()
uses one of several generative models to create an indicence matrix or
bipartite graph from a unipartite graph.
curveball()
randomizes an
incidence matrix or bipartite graph while preserving in marginal/degree
sequence, and add.blocks()
adds a
block structure or planted partition to an existing incidence matrix or
bipartite graph.
incidence.from.congress()
constructs an incidence
matrix or bipartite graph representing US Congress legislators’
sponsorship of bills. For a detailed description of this function, and
its use with the backbone
package to generate political
networks, see this companion
vignette.
The incidence.from.probability()
function generates an
incidence matrix or bipartite graph with a given probabaility \(p\) that \(I_{ij}
= 1\), and thus an overall fill rate or density of
approximately \(p\). We can use it to
generate a \(10 \times 10\) incidence
matrix in which \(Pr(I_{ij} = 1) \approx
.2\):
<- incidence.from.probability(10, 10, .2)
I
I#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0 0 1 0 0 0 0 1 0 0
#> [2,] 0 0 0 0 0 1 0 1 0 0
#> [3,] 0 0 0 0 0 0 1 1 0 0
#> [4,] 0 0 0 0 0 0 0 0 1 1
#> [5,] 1 1 0 1 1 0 0 0 0 0
#> [6,] 0 0 0 1 0 0 1 0 0 0
#> [7,] 0 0 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 0 0 0 0 0 0 1
#> [9,] 0 0 0 0 0 0 0 1 0 0
#> [10,] 0 0 0 0 0 0 1 0 0 0
mean(I) #Fill rate/density
#> [1] 0.18
By default, incidence.from.probability()
only generates
incidence matrices in which no rows or columns are completely empty or
full. We can relax this constraint, allowing some rows/columns to
contain all 0s or all 1s by specifying
constrain = FALSE
:
<- incidence.from.probability(10, 10, .2, constrain = FALSE)
I
I#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0 0 0 1 0 0 0 0 0 0
#> [2,] 0 0 0 1 0 0 1 0 0 0
#> [3,] 0 0 0 0 0 0 0 1 0 1
#> [4,] 0 0 0 0 1 0 0 1 0 0
#> [5,] 0 0 1 0 0 0 0 0 0 1
#> [6,] 0 0 0 0 0 0 1 0 1 0
#> [7,] 0 0 0 0 1 0 1 1 0 0
#> [8,] 0 0 0 0 1 0 0 0 0 0
#> [9,] 0 0 0 0 1 0 0 0 0 0
#> [10,] 1 0 0 0 1 1 1 0 0 0
mean(I) #Fill rate/Density
#> [1] 0.2
The incidence.from.vector()
function generates an
incidence matrix with given row and column marginals, or a bipartite
graph with given row and column node degrees. The generated matrix or
graph represents a random draw from the space of all such matrices or
graphs. We can use it to generate a random incidence matrix with \(R = \{3,4,2\}\) and \(C = \{1,2,2,2,2\}\):
<- incidence.from.vector(c(4,3,2), c(1,2,2,2,2))
I
I#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 1 0 1
#> [2,] 0 0 1 1 1
#> [3,] 0 1 0 1 0
rowSums(I) #Row marginals
#> [1] 4 3 2
colSums(I) #Column marginals
#> [1] 1 2 2 2 2
The incidence.from.distributions()
function generates an
incidence matrix (or bipartite graph) in which the row and column
marginals (or row and column node degrees) follow Beta distributions
with given shape parameters. Beta distributions are used because they
can flexibly capture many different distributional shapes:
A \(100 \times 100\) incidence matrix with approximately uniformly distributed row and column marginals:
<- incidence.from.distribution(R = 100, C = 100, P = 0.2,
I rowdist = c(1,1), coldist = c(1,1))
#>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022) to generate a random incidence matrix with 100 rows whose sums are approximately distributed as B(1,1), and 100 columns whose sums are approximately distributed as B(1,1).
#>
#> Neal, Z. P. (2022). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")
A \(100 \times 100\) incidence matrix with approximately right-tail distributed row and column marginals:
<- incidence.from.distribution(R = 100, C = 100, P = 0.2,
I rowdist = c(1,10), coldist = c(1,10))
#>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022) to generate a random incidence matrix with 100 rows whose sums are approximately distributed as B(1,10), and 100 columns whose sums are approximately distributed as B(1,10).
#>
#> Neal, Z. P. (2022). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")
A \(100 \times 100\) incidence matrix with approximately left-tail distributed row and column marginals:
<- incidence.from.distribution(R = 100, C = 100, P = 0.2,
I rowdist = c(10,1), coldist = c(10,1))
#>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022) to generate a random incidence matrix with 100 rows whose sums are approximately distributed as B(10,1), and 100 columns whose sums are approximately distributed as B(10,1).
#>
#> Neal, Z. P. (2022). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")
A \(100 \times 100\) incidence matrix with approximately normally distributed row and column marginals:
<- incidence.from.distribution(R = 100, C = 100, P = 0.2,
I rowdist = c(10,10), coldist = c(10,10))
#>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022) to generate a random incidence matrix with 100 rows whose sums are approximately distributed as B(10,10), and 100 columns whose sums are approximately distributed as B(10,10).
#>
#> Neal, Z. P. (2022). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")
A \(100 \times 100\) incidence matrix with approximately constant row and column marginals:
<- incidence.from.distribution(R = 100, C = 100, P = 0.2,
I rowdist = c(10000,10000), coldist = c(10000,10000))
#>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022) to generate a random incidence matrix with 100 rows whose sums are approximately distributed as B(10000,10000), and 100 columns whose sums are approximately distributed as B(10000,10000).
#>
#> Neal, Z. P. (2022). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
rowSums(I)
#> [1] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#> [26] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#> [51] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#> [76] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
colSums(I)
#> [1] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#> [26] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#> [51] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
#> [76] 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
Different types of Beta distributions can be combined. For example, we can generate a \(100 \times 100\) incidence matrix in which the row marginals are approximately right-tailed, but the column marginals are approximately left-tailed:
<- incidence.from.distribution(R = 100, C = 100, P = 0.2,
I rowdist = c(1,10), coldist = c(10,1))
#>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022) to generate a random incidence matrix with 100 rows whose sums are approximately distributed as B(1,10), and 100 columns whose sums are approximately distributed as B(10,1).
#>
#> Neal, Z. P. (2022). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
hist(rowSums(I), main = "Row Marginals")
hist(colSums(I), main = "Column Marginals")
Focus theory suggests that social networks form, in part, because individuals share foci such as activities that create opportunities for interaction (Feld 1981). Individuals’ memberships in foci can be represented by an incidence matrix or bipartite graph. The social network that may emerge from these foci memberships can be obtained via bipartite projection, which yields an adjacency matrix or unipartite graph in which people are connected by shared foci (Breiger 1974; Neal 2014).
Focus theory therefore explains how incidence/bipartite \(\rightarrow\) adjacency/unipartite.
However, it is also possible that individuals’ interactions in a social
network can lead to the formation of new foci. That is, it is possible
that adjacency/unipartite \(\rightarrow\) incidence/bipartite. The
incidence.from.adjacency()
function implements three
generative models (model = c("team", "group", "blau")
) that
reflect different ways that this might occur. These models are
illustrated below, but are described in detail by Neal (2023).
The teams model mirrors a team formation process (Guimera et al. 2005) that depends on the structure of a given network in which cliques represent prior teams. Each row in the generated incidence matrix represents an agent in a given social network, while each column in the incidence matrix records the members of a team. Teams are formed from the incumbants of a randomly selected prior team (with probability \(p\)) and newcomers (with probability \(1-p\)).
Given an initial social network among 15 people, we can simulate
their formation of one (k = 1
) new team, where there is a
p = 0.75
probability that a prior team member joins the the
new team:
<- erdos.renyi.game(15, .5) #A random social network of 15 people, as igraph
G <- incidence.from.adjacency(G, k = 1, p = .75, model = "team") #Teams model
I #>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022a) to generate a random bipartite graph from a unipartite graph. The bipartite graph represents the memberships of the 15 nodes from the unipartite network in 1 newly-formed teams (Neal, 2022b).
#>
#> Neal, Z. P. (2022a). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
#> Neal, Z. P. (2022b). The Duality of Networks and Foci: Generative Models of Two-Mode Networks from One-Mode Networks. arXiv:2204.13670 [cs.SI]. https://doi.org/10.48550/arXiv.2204.13670
class(I) #Incidence matrix returned as igraph object
#> [1] "igraph"
V(I)$shape <- ifelse(V(I)$type, "square", "circle") #Add shapes
plot(G, main="Prior Team Collaborations")
plot(I, layout = layout_as_bipartite(I), main="New Team")
Notice that because the social network G
is supplied as
a igraph
object, the generated object I
is
returned as an igraph
bipartite network, which facilitates
subsequent plotting and analysis. In this example, a new team (node 16)
is formed by four agents (nodes 4, 8, 11, and 15). The function
simulates the formation of the team invisibly, so we cannot see exactly
why this particular team formed. This team may have emerged from the
prior 4-member team of 4, 8, 13, and 15 (they are a clique in the social
network), where three positions on the new team are filled by incumbents
(4, 8, and 15), while the final position is filled by a newcomer
(11).
The clubs model mirrors a social club formation process (Backstrom et al. 2006) in which current club members try to recruit their friends. To ensure a minimum level of group cohesion, potential recruits join a newly-forming club only if doing so would yield a club in which the members’ social ties have a density of at least \(p\). Each row in the generated incidence matrix represents an agent in a given social network, while each column in the incidence matrix records the members of a club.
Given an initial social network among 15 people, we can simulate
their formation of one (k = 1
) new club, where the club has
a minimum density of p = 0.75
:
<- erdos.renyi.game(15, .33) #A random social network of 15 people, as igraph
G <- incidence.from.adjacency(G, k = 1, p = .75, model = "club") #Groups model
I #>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022a) to generate a random bipartite graph from a unipartite graph. The bipartite graph represents the memberships of the 15 nodes from the unipartite network in 1 newly-formed club (Neal, 2022b).
#>
#> Neal, Z. P. (2022a). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
#> Neal, Z. P. (2022b). The Duality of Networks and Foci: Generative Models of Two-Mode Networks from One-Mode Networks. arXiv:2204.13670 [cs.SI]. https://doi.org/10.48550/arXiv.2204.13670
V(I)$shape <- ifelse(V(I)$type, "square", "circle") #Add shapes
plot(G, main="Social Network")
plot(I, layout = layout_as_bipartite(I), main="New Group")
In this example, a new club (node 16) is joined by five agents (nodes 3, 6, 10, 13, and 14). The social network among these group members is very cohesive (density = .9). The members of this group may have attempted to recruit other friends, but these others did not join because doing so would have reduced the group’s cohesion. For example, agent 10 may have tried to recruit agent 9. However, if agent 9 had joined the club, the new club would have a density of 0.73, which is lower than the minimum threshold. Therefore, agent 9 did not join the club.
The Organizations model mirrors an organizational recruitment process (McPherson 1983). The given social network is embedded in a \(d\) dimensional social space in which the dimensions are assumed to represent meaningful social distinctions, such that socially similar people are positioned nearby. Organizations recruit members from this space, recruiting people inside their niche with probability \(p\), and outside their niche with probability \(1-p\). Each row in the generated incidence matrix represents an agent in a given social network, while each column in the incidence matrix records the members of an organization.
Given a social network among 15 people, we can simulate their
recruitment by one (k = 1
) new organization, where there is
a p = 0.95
probability that an individual inside an
organization’s niche becomes a member:
<- erdos.renyi.game(15, .5) #A random social network of 15 people, as igraph
G <- incidence.from.adjacency(G, k = 1, p = .95, model = "org") #Groups model
I #>
#> === Suggested manuscript text and citations ===
#> We used the incidentally package for R (v1.0.2; Neal, 2022a) to generate a random bipartite graph from a unipartite graph. The bipartite graph represents the memberships of the 15 nodes from the unipartite network in 1 newly-formed organizations (Neal, 2022b).
#>
#> Neal, Z. P. (2022a). incidentally: An R package to generate incidence matrices and bipartite graphs. OSF Preprints. https://doi.org/10.31219/osf.io/ectms
#> Neal, Z. P. (2022b). The Duality of Networks and Foci: Generative Models of Two-Mode Networks from One-Mode Networks. arXiv:2204.13670 [cs.SI]. https://doi.org/10.48550/arXiv.2204.13670
V(I)$shape <- ifelse(V(I)$type, "square", "circle") #Add shapes
plot(G, layout = layout_with_mds(G), main="Social Network")
plot(I, layout = layout_as_bipartite(I), main="New Organizations")
The social network is plotted using a Multidimensional Scaling
layout, and therefore shows the nodes’ positions in the abstract Blau
Space from which organizations recruit members. In this example, a new
organization (node 16) recruits four members (nodes 2, 8, 10, and 14).
This organization’s niche is located on the left side of the space,
where it very successfully (because p = 0.95
) recruits all
the people in this region.
The curveball()
function uses the curveball algorithm
(Strona et al. 2014) to randomize an
incidence matrix or bipartite graph while preserving its marginal/degree
sequence. The result is uniformly randomly sampled from the space of all
incidence matrices (bipartite graphs) with the give marginal (degree)
sequences.
For example, given a \(3 \times 3\) matrix with row marginals \(R = \{2,1,1\}\) and column marginals \(C = \{1,1,2\}\), we can randomly sample a new matrix with the same row and column marginals:
<- matrix(c(1,0,0,0,1,0,1,0,1),3,3)
I
I#> [,1] [,2] [,3]
#> [1,] 1 0 1
#> [2,] 0 1 0
#> [3,] 0 0 1
curveball(I)
#> [,1] [,2] [,3]
#> [1,] 1 1 0
#> [2,] 0 0 1
#> [3,] 0 0 1
The add.blocks()
function shuffles an incidence matrix
or bipartite graph to have a block structure or planted partition while
preserving the row and column marginals/degrees. For example, after
generating a 10 \(\times\) 10 incidence
matrix with a density of .2, we can plant a partition in which the
within-group density = 0.75
. In this example, the first 5
row nodes and first 5 column nodes are assigned to block 1, while the
second 5 row nodes and second 5 column nodes are assigned to block
2:
<- incidence.from.probability(R = 10, C = 10, P = .2)
I <- add.blocks(I, density = .75,
blocked rowblock = c(1,1,1,1,1,2,2,2,2,2),
colblock = c(1,1,1,1,1,2,2,2,2,2))
#Original matrix
I #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0 0 1 1 1 0 0 0 0 0
#> [2,] 1 1 0 0 0 1 1 0 0 0
#> [3,] 1 0 0 0 0 1 0 1 0 1
#> [4,] 0 0 0 1 0 0 1 0 1 1
#> [5,] 0 0 1 0 0 0 1 0 0 0
#> [6,] 0 0 1 1 0 0 0 1 0 0
#> [7,] 0 0 0 0 1 0 0 0 0 0
#> [8,] 0 1 1 0 0 0 0 1 1 0
#> [9,] 0 0 0 0 0 1 0 0 0 0
#> [10,] 0 0 0 0 1 1 0 0 0 0
#Blocked matrix
blocked #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0 0 1 1 1 0 0 0 0 0
#> [2,] 1 1 1 0 0 1 0 0 0 0
#> [3,] 1 1 1 0 1 0 0 0 0 0
#> [4,] 0 0 0 1 1 0 1 0 0 1
#> [5,] 0 0 1 0 0 0 1 0 0 0
#> [6,] 0 0 0 1 0 0 1 1 0 0
#> [7,] 0 0 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 0 0 1 0 1 1 1
#> [9,] 0 0 0 0 0 1 0 0 0 0
#> [10,] 0 0 0 0 0 1 0 1 0 0
all(rowSums(I)==rowSums(blocked)) #Row marginals preserved
#> [1] TRUE
all(colSums(I)==colSums(blocked)) #Column marginals preserved
#> [1] TRUE