1,000
Features Per Second for Node-to-Node Trace!
Description
of an Electric Trace for GTViewer using Bentley Utility Designer Data
Huntsville
Utilities
Begun
April 3, 2013
Revised
January 14, 2014
Summary
Network
traces have historically been slow – 1 feature per second, maybe 10, maybe 100. This paper shows how to achieve 1,000
features per second using standard data structures. The primary innovation is
to pre-process certain portions of the data into text files which are loaded
into memory (into hash tables) when the trace application is first
started. Most of the data access during
the trace uses these hash tables (which are one of the fastest data structures
available) instead of disk reads, so the trace runs faster than most.
Audience
The
paper is intended for anyone writing a network trace, whether with GTViewer or
not, whether with data from Bentley or not.
The trace is implemented using GTViewer on data that has been converted
from Bentley to the format used by GTViewer, so I will necessarily get down
into the details of GTViewer and Bentley in order to explain what’s going
on. But an interested reader should be
able to understand how to implement the same concepts in his own environment.
Network
tracing is one of many practical applications of Graph Theory. Here is the definition from Wikipedia: (http://en.wikipedia.org/wiki/Graph_theory)
“In mathematics and computer
science, graph theory is the study of graphs, which are mathematical structures
used to model pairwise relations between objects. A "graph" in this
context is made up of "vertices" or "nodes" and lines
called edges that connect them. A graph may be undirected, meaning
that there is no distinction between the two vertices associated with each
edge, or its edges may be directed from one vertex to another; see graph (mathematics) for more detailed
definitions and for other variations in the types of graph that are commonly
considered. Graphs are one of the prime objects of study in discrete mathematics.”
Why is Graph Theory important to GIS?
Because our electric, gas, telephone, water, sewer, and storm drain
networks are easily modeled as graphs,
thereby allowing all of the many tools from graph theory to be used to analyze
GIS data.
A Note on .NET’s NameValueCollection
Before
we get very far, I should share some things about a data structure that .NET
calls a “NameValueCollection”. It is
implemented as a hash table (actually .NET will determine how big the thing is
going to be and implement either a hash table or a simpler structure, whichever
will be faster, or so I’ve read in Rod Stevens, Visual Basic 2005.) The programmer adds keys and values. Then when he has a key, he can retrieve the
value quickly.
I
should note here one very cool thing that .NET does with the
NameValueCollection. You don’t need to
know at run time how many values you may wind up with for a given key. In most cases, you have a single value for a
key. But the value returned by
GetValues(key) in .NET is a List(Of String).
So in case there is only one value, it’s value(0). But you could add several values over the
course of processing and parse them when you retrieve them. I use this capability during the trace to find
nodes for electric primaries.
Pre-Processing
Let
me describe the pre-processing first.
During development, of course, this came last. I got the trace working and then looked for
ways to speed it up and wrote the pre-processing stuff then. But the interesting aspect of the trace depends
on the results of pre-processing. The
reader will need to be patient because the reasons for generating various text
files during pre-processing may be poorly motivated if you don’t know what the
trace logic is going to do with them.
There is a flow chart below on page 8, and I hope it’s helpful.
A
word on “keys”. GTViewer uses a one- or
two-key system to link graphic elements to tabular data. Older readers are welcome to think DMRS,
MSLINK, FRAMME, etc. In most of what
follows, I will only be concerned with the second key because the first key for
Bentley data as converted to GTViewer is always zero (or some other arbitrary
value set at conversion time).
The VB.NET Project: HU_BuildConnectivityTable
HU_BuildConnectivityTable
builds six text files:
1.
Connectivity.txt
2.
ConductorTable.txt
3.
StopFeature.txt
4.
StopNode.txt
5.
UnitTransformerTable.txt
6.
TransformerThick.txt
1.
Building
Connectivity.txt
We begin
by opening a stream reader for GTViewer’s “data.txt” file, the record of
tabular data for all features. It reads
line-by-line until it finds the table for “E_OHPRIMARY”. Then it establishes offsets for the
attributes "PRIMARY_STARTID_Remap" and
"PRIMARY_ENDID_Remap".
(Why? And what do I mean by this? In data.txt, GTViewer has a line for the tabular
data for each element. Attribute values
are delimited by pipe characters. Each
table section begins with a description of its structure, which includes a list
of the attributes in the order they appear in the pipe-delimited list. So it is a trivial programming activity in
.NET to “Split” each line of attributes into an array where I know the index of
the PRIMARY_STARTID_Remap and PRIMARY_ENDID_Remap attributes. Why “_Remap”?
The conversion from Bentley to GTViewer retains the PRIMARY_STARTID and
PRIMARY_ENDID attribute values as they exist in Bentley, but it also remaps
these somewhat cumbersome GUID-style values to unique integer values that are
easier to work with.)
Back
to data.txt. We read each line and write
the GTViewer key2 and the start node (PRIMARY_STARTID_Remap) and end node (PRIMARY_ENDID_Remap),
pipe delimited, to the text file Connectivity.txt.
Later
on, when the user starts the trace application, this text file will be read and
loaded into a NameValueCollection called “"nvcNodes” with key2 as the key.
So
what? “So what” is that at run time, I
will be able to retrieve the start and end nodes of a primary if I know the
graphic-tabular link of the primary.
Specific motivation for that will need to wait for the description of
the recursive subroutine at the heart of the trace code.
Electricity
travels over more features than OHPRIMARY, so the code also reads the
“E_UGPRIMARY” and “E_SWCABBUSBAR” tables and appends the pipe-delimited key2|start_node|end_node
string to the file Connectivity.txt.
(Looking ahead to a
future version, the code also has provision for E_OHSECONDARY, E_UGSECONDARY,
E_OHSERVICE, and E_UGSERVICE, but our data at Huntsville Utilities does not
have these values sufficiently populated with accurate values so that a
secondary trace can run.)
2.
Building
ConductorTable.txt
Slightly
different in implementation, but with similar results. We run a GTViewer query for E_OHPRIMARY and
add key2 and ”|” and “E_OHPRIMARY” to a
List(Of String). We then run another
query for E_UGPRIMARY, and another for E_SWCABBUSBAR. Then we walk through that List and write
ConductorTable.txt.
How
can I motivate such a simple-minded activity?
At times during the trace, the code will have key2 for a primary, and it
will need to now what table that primary is in.
In early versions of the trace, I found this table in a clunky way that
took a lot of time. Currently, I load up
another NameValueCollection (hash table) called “nvcConductorTable” when the
user starts the trace, so the look-up is brisk.
3.
Building
StopFeature.txt
Similar
to building ConductorTable.txt except that it takes two steps. First, we build a StopFeatureList by running a
query to find every switch. That
sentence hides some complexity. I
compile a GTViewer query that returns the PRIMARY_ATTACHEDEDGEID for each
E_OHSWITCH. Then I compile similar
queries for each type of stop feature.
Because each of these queries only asks for PRIMARY_ATTACHEDEDGEID, I
can append them into a single query.
GTViewer queries (when run in code) return the attribute(s) you ask for,
and they also return the category and offset of each element they find. The category for all primary features is 304,
so we know that without the query. But
the offset is golden. We write the
PRIMARY_ATTACHEDEDGEID and “|” and the offset of the switch to a list. In .NET
speak: a List(Of String) called “StopFeatureList”.
Then
we walk through StopFeatureList and use the PRIMARY_ATTACHEDEDGEID value to run
another query to find primaries by PRIMARY_NETID. The PRIMARY_NETID for a primary is, of
course, the same number as the PRIMARY_ATTACHEDEDGEID for a stop feature that
is attached to this primary. So as the
query returns primaries which we know have stop features attached, we note the
key2 of the primary and write it to a temporary list with a pipe symbol and the
PRIMARY_NETID. Then we write this list
to the file StopFeature.txt.
Again,
so what? Well, after loading StopFeature.txt
into a NameValueCollection called “nvcStopFeature” when the user starts the
trace application, we can quickly test whether a primary has a stop feature
attached. We use the key2 of the primary
as a key to nvcStopFeatures. If we get any
values, then the primary has a stop feature attached. If not, it doesn’t.
4.
Building StopNode.txt
We
use StopFeatureList built during the previous step. We walk through it, pulling out the
PRIMARY_ATTACHEDEDGEID (which, of course, is the PRIMARY_NETID of the primary
that has a Switch or other stop feature) and and the offset of the stop
feature.
First we build a
short list called “shortList”, a 2-element array (Of String), that holds the
nodes (PRIMARY_STARTID_Remap and PRIMARY_ENDID_Remap) for this primary. The stop node will be one of these. Perhaps both.
Then we use the
offset to do a GTViewer proximity search (within 50 feet of the origin of the
stop feature) to build a list of nodes called “nodeList” from nearby
primaries. (PRIMARY_STARTID_Remap and
PRIMARY_ENDID_Remap) We omit the primary
that has the stop feature attached, because we already have those nodes in
shortList.
Then, if nodeList
contains shortList(0), then we add it to StopNodeList along with “|” and the
offset of the stop feature.
Also, if nodeList
contains shortList(1), then we add it to StopNodeList along with “|” and the offset
of the stop feature.
Then
we process the next item in StopFeatureList.
Then
we write StopNodeList to the file StopNode.txt.
When the trace initializes, we load StopNode.txt into the
NameValueCollection “nvcNodeOffsets”.
So
now, while we search from node to node, we can quickly tell if we have reached
a node where the trace should stop. To
appreciate why I take the two steps of building both StopNode.txt and
StopFeature.txt, see “Special Bentley Considerations” below.
5.
Building
UnitTransformerTable.txt and TransformerThick.txt
We
can skip these. They are not used for
the trace proper, but are used to sum transformer load by phase for a
circuit. I build the file
UnitTransformerTable.txt first and use it to help build the file
TransformerThick.txt. I do it that way
because some transformers have a KVA for a phase, and some have child records
with different KVA ratings for each of the children. The point is that, as is becoming boring to
you by now, I load this file into a NameValueCollection when the user starts
the trace application, and then I can use the node number (on currently
processing primary, which is equal to the PRIMARY_NETID of the attached
transformer) to get the KVA load for each phase.
Summary of Pre-processing
File name
|
Contents
|
Meaning
|
Comment
|
ConnectivityTable.txt (used to build nvcNodes)
|
key2
|
linkage for E_ OHPRIMARY, E_UGPRIMARY, or E_SWCABBUSBAR
|
Written on graphic element; points to its tabular data.
|
node1
|
PRIMARY_STARTID_Remap
|
PRIMARY_STARTID mapped to an integer.
|
|
node2
|
PRIMARY_ENDID_Remap
|
PRIMARY_ENDID mapped to an integer.
|
|
ConductorTable.txt (used to build nvcConductorTable)
|
key2
|
linkage for E_ OHPRIMARY, E_UGPRIMARY, or E_SWCABBUSBAR
|
Written on graphic element; points to its tabular data.
|
table name: E_ OHPRIMARY, E_UGPRIMARY, or E_SWCABBUSBAR
|
|||
StopFeature.txt
(used to build nvcStopFeature)
|
key2
|
linkage for E_ OHPRIMARY, E_UGPRIMARY, or E_SWCABBUSBAR
that has a stop feature attached to it.
|
Written on graphic element; points to its tabular data.
|
PRIMARY_NETID
|
Link to PRIMARY_ATTACHEDEDGEID for the stop feature
|
All I really need is for key2 to have a value in the
NameValueCollection.
|
|
StopNode.txt
(used to build nvcNodeOffsets)
|
node
|
PRIMARY_STARTID_Remap or PRIMARY_ENDID_Remap
|
|
offset
|
Position of the stop feature in the Primary category
|
Allows the trace to find the origin of the stop feature
in order to circle it. Other than the
circle, all I need is a value in the NameValueCollection.
|
Looking over the summary table above, it may seem that
I’m force fitting the stop feature and stop node into a NameValueCollection
when all I need is a list, array, or stack.
I tried these data structures, and they were not as fast as the
NameValueCollection. My guess is that a
list, array, or stack is searched in sequential order, so the seek time is N/2,
whereas the seek time for a NameValueCollection (a hash table) is going to be
more like a fixed value between 1 and 2.
On
my four year old Dell laptop, HU_BuildConnectivityTable runs in 121 seconds (call
it 2 minutes) on a data set with 89,297 primaries, including overhead
primaries, underground primaries, and bus bars.
This run time includes building the Transformer text files. HU_TraceElectric takes about 2 or 3 seconds
to initialize, which includes loading the text files into
NameValueCollections. I consider these
to be acceptable times when they enable a node-to-node trace to run at about
1,000 features per second.
The Trace Logic
The Trace Logic
The
VB.NET project is called HU_ElectricTrace.
It is run as an external application in GTViewer. Here is the essential section of the form
shown to the user:
A
simplified flowchart is presented on the next page.
Simplified Flow Chart
Sequence of Actions
The
user clicks the “Trace Primary” button on the form and then clicks on a breaker
at a substation (E_PRIMARYCIRCUIT). A
current limitation is that the trace can only start with a substation
breaker. Future versions will allow the
user to start on any conducting feature.
Then
we determine the name of the circuit from that feature and write it to the
Listbox on the form.
Then
we get keys for the breaker and determine its node number (PRIMARY_NETID_Remap). This is the initialStartNode. We push that
node on the node stack, push that node on the nodes visited stack, and pass it
to the Node Iterator.
Node Iterator
The
sub “Node Iterator” is the recursive subroutine at the heart of the trace.
It
runs a special GTViewer query called a “connectivity query”, that returns 2
arrays when fed a node number. One array
is a list of key1 values. The other is a
list of key2 values. It also returns a
count of the number of values returned, so the programmer writes a For…Next
loop that processes each set of key1-key2 pairs in turn. The keys are the links to every primary that
is connected to the node. In our case,
we need only use key2.
(Now
that I have learned a few things by working on this trace, I realize I could
write an equivalent to the connectivity query.
It would be a NameValueCollection with the node as the key and a list of
key2’s as the values. When you define
the connectivity query, you have to provide the sorts of features that can have
nodes you want to consider, as well as the table and attribute names for the
nodes. For example, E_OHPRIMARY,
PRIMARY_STARTID, PRIMARY_ENDID. My code
works fine using the special GTViewer query; but developers outside the
GTViewer product can just as well use a NameValueCollection with the node
number as key and the list of links to primaries as the values.)
Joey
Rogers, designer and developer of GTViewer, has provided a note on the
connectivity query:
The connectivity
query in GTViewer is performing a Binary Search over records that are
pre-sorted. There is a lot of debate
over using Binary Search vs. Hash Tables to look up values. Binary Searches generally work better if you
don’t have unique keys; Hash Tables
usually work better if you have unique keys.
Since there are many records with the same keys in typical GIS data, and
since the connectivity query uses the same keys for each node of a feature, the
Binary Search was used in GTViewer.
However, Hash tables should work better for this trace application
because you can use unique keys for most/all of the different pre-processed
data in the hash tables.
For
each key2, we check whether it is already in a stack that contains key2 values
for primaries that have already been processed.
(Since the connectivity query returns all the primaries that share that
node, once you are downstream on the trace, it will return the primary you came
from, which you want to disregard. By
historical accident, this stack is called “ufidStack”. I will refer to it later.)
(At the risk of
starting too many side trails, I will note here that if key2 is already in
ufidStack, the code tests whether the count of connected primaries is 1. If so, then we have a leaf node, and we add
its keys to a List(Of Integer) called “leafNodes”. I don’t use this in the current version of
the trace, but in previous versions I did something with all these leaf
nodes. So I kept the ability to keep
track of them, and here is where. After
this leaf node test the code skips to the …Next part of the For…Next loop.)
If
key2 has not already been processed, we use it to get the table for that
feature from the NameValueCollection “nvcConductorTable”.
Now we
test whether the current node is a stop node and whether the primary for which we have key2 is a stop
feature, using simple functions that use nvcStopNode and nvcStopFeature. If both tests pass, we have found a stop node
on a stop feature; so we add the stop feature to a highlight list for stop
features and skip to the next set of keys in the For…Next loop.
If
not, then we keep going.
We
push the key2 of the primary onto the ufidStack. We increment the primary counter. We add the primary to the highlight list. (In GTViewer, there is a concept of adding
something to the highlight list and separately displaying the highlight. This separation allows you to add lots of
features to the highlight list without slowing down by updating the
display.) Based on a user setting on the
trace form, we update the display after processing every nth primary or we wait
until the whole circuit is complete.
Now
we determine the next node to feed Node Iterator.
We
look at the NameValueCollection “nvcNodes”, which was built during
initialization from Connectivity.txt.
The key is the key2 for the primary; the value is the pipe-delimited
list of two nodes. The GetNextNode
function merely parses out the nodes and returns the one that is not already in
the nodesVisited stack.
(Performance
note: It seems that .NET does a (slow)
sequential search to find out if an item is in a List, but does a fast search
to find if an item is in a Stack. I
could be wrong. It may be that the stack
of visited nodes is always relatively small (<2 and="" because="" composed="" each="" fast.="" feeder="" for="" integers="" is="" it="" of="" reinitialized="" s="" so="" span="" style="mso-spacerun: yes;" that="" why=""> 2>Or it may be that even a
sequential search of the stack from the top down will generally find the
previous node soon in its path.)
Not
quite done. This new node may be a stop
node. We add it to the nodesVisited
stack and test whether it is a stop node and whether the feature for which we
have key2 is a stop feature. We may have
just turned up the other end of a feature that is a stop node and need to stop
there. If we do not find both stop node
and feature, we push the new node onto the nodeStack and pass it to Node Iterator,
which feeds the recursion.
The
very next line in the code is “nodeStack.Pop”.
It’s really magical to me. The
recursion can go down a long path, but as it unpeels, the nodeStack pops off
the nodes, and can resume at a node it bypassed earlier. When the last node on the stack is popped,
it’s done.
When the Node Iterator is Done
We
update the view. In case the ShowAsYouGo
box has not been checked, this is the first time the user sees the traced
primaries on the screen.
We
use the primary counter to show how many primaries have been processed.
Then
we call the subroutine PostProcess.
Not
too much of interest for the purposes of this paper, but I will note that if
the box ShowTransformerKvaByPhase has been checked, we show a summary of load
by phase. This may have some interest
for Bentley people. For some types of
transformers, it is necessary to drill down to the child records of a
Transformer Bank to get the phases of the units. The hard work is done in HU_BuildConnectivityTable
where we create another text file that makes it easy (and brisk) at run time.
Special Bentley Considerations
Bentley
models stop features (E_OHSWITCHBANK, E_SWCABFUSEDSWITCH, E_SWCABGROUPSWITCH)
as edge devices. Their attribute PRIMARY_ATTACHEDEDGEID
maps to the PRIMARY_NETID of a conducting feature. Therefore, stop features cannot be placed in
a NODE1-NODE2 pattern. It would simplify
processing tremendously if Switches and other potential stop features had a
node that matched where they connected to the primaries. But it would require a potentially major rework
of the Bentley feature model.
What
the trace code has to do is test whether a node is a stop node and whether the
primary it is currently processing is attached to a stop feature. Only if both conditions are satisfied should
the trace stop. If stop features had a
shared node with the primaries they connect to, this check would be simpler.
For a
Bentley developer, the task of writing all these queries transforms into the
task of writing a series of SQL statements.
I don’t mean to minimize the task, but it should be approachable for an
SQL guy.