Large-Scale Semantic Networks

Recently as both a followon to my thesis research and as a part of my work for the Search Engine That Starts With A G, I’ve been investigating a number of papers on graph theory as applied to semantic networks and Web link graphs. This work isn’t “new” per se, since these models and techniques for graph analysis go back years if not decades, but it’s only recently – when vast real-world networks have become available in machine-readable form along with the computer power necessary to analyze them – that enough work has been done to illuminate how important it is to analyze the properties of networks in detail.

One of the most interesting papers I’ve encountered so far was The Structure and Function of Complex Networks, a survey of mathematical and empirical studies of networks that I had wish had been available when I was doing my thesis work. The most important result I think to come out of recent graph theory is that simple uniform and random models of networks don’t tell the whole story – now, we have new mathematical tools for modeling a wide range of graph architectures, such as:

Small World Networks. A traditional graph model says nothing about how far you may need to travel to find an arbitrary node in the graph. Think of square tiles on a floor – each tile is connected to four others, but the number of steps needed to reach any tile is a function of the distance. But anyone who’s played the parlor game “Six Degrees of Kevin Bacon” knows that real-world human relationships aren’t arranged this way: everyone is connected to everyone else in the world by a very short chain of relationships – on average, you can reach almost anyone in the world in only six or seven handshakes. This shows up in graph analysis as the “mean geodesic distance” between two nodes in a graph, and is an important property we should measure about our networks as they grow to determine what kind of network structure we are really dealing with. At the very least, a random graph structure shows small-world properties more similar to real-world graphs than uniform graphs, and we should consider using them over uniform models as a basis for analyses.
Scale-Free Networks. Both uniform networks – where every node has the same link structure – and random networks – where nodes are connected at random to each other across the graph – have a definite scale or rough average size. Nodes in a random graph are like people: each one is unique, but their heights are distributed over a defnite scale so there are few people shorter than three feet and no people taller than nine. Real world networks don’t have this definite scale: instead, they look the “same” no matter what size scale you’re looking at. Nodes in a real world graph are like the distribution of city sizes: for every city there are four times as many cities at half that size. This shows up technically in the “degree distribution”: the statistical pattern of the number of links on each node. This will no doubt have significant effects on processes operating over networks like spreading activation.

There are more issues in graph theory than I can readily summarize, including issues like resilience to deletion, models of growth, and so on; many of which are directly relevant to studies of semantic networks and processes that operate over them.

Another paper, The Large-Scale Structure of Semantic Networks, applies these techniques to real-world semantic network models drawn from sources such as WordNet and Roget’s Thesaurus. This paper, like the survey article Graph Theoretic Modeling of Large Scale Semantic Networks, seems to find that real semantic networks have scale-free, small-world properties that aren’t found in the simpler mathematical models that people such as Francis (cough) used in his thesis.

SO, anyone interested in semantic networks or spreading activation as a tool for modeling human cognition or as a representation scheme for an intelligent system would do well to follow up on these references and begin an analysis of their systems based on these “new” techniques (many of which have been around for a while, but sadly hadn’t reached everyone in the semantic network community (or, at least, hadn’t reached me) until more recently.

So check them out!
-the Centaur

centaur