|
|
| GeoCommunity Mailing List |
| |
| Mailing List Archives |
| Subject: | Re: [gislist] cluster or group points in AV3.2 |
| Date: |
01/07/2004 03:25:00 PM |
| From: |
Quantitative Decisions |
|
|
At 03:18 PM 1/7/2004 -0500, RICK GRAY wrote: >My problem is this: I have a few hundred points on a map. I need to >group them (to make territories). Is there a script or a method to >cluster these into X points per territory. Or to divide the total into Y >territories ...
This is only half the statement of the problem: it more or less describes some of the data. What you need to think hard about is what makes a good cluster. To this end, people have devised a huge number of clustering techniques: K-means, linkage, and tree methods, among others, most of which are some form of optimization problem.
Here are some fundamental considerations that strongly influence the solution method (in no particular order):
(1) Will you specify how many clusters must be created, or should the number be part of the solution ("natural" clustering)?
(2) How will you measure how clustered a group of points is? Here are some metrics commonly encountered: -- Maximum distance between two points -- Mean inter-point distance -- Root mean squared inter-point distance -- Maximum distance to nearest neighbor -- Maximum separation relative to another cluster
Of course, you will need to consider what metric to use. For a "territory" the usual metrics are geodesic distance (Euclidean or spherical), distance along a transportation network, travel time along a network, or a "cost-distance" function.
(3) Will clustering be based only on location or will it depend also on point attributes? If the latter, this leads to a higher-dimensional clustering problem, typically amenable to similar solution techniques, but you will have to quantify the trade-offs between geographic proximity and attribute similarity.
(4) Will clusters need to be separated by "nice" boundaries, such as straight lines (this leads to a branch of statistical theory called linear discriminant analysis), or do the boundaries not matter?
(5) Do there exist geographic obstacles that prevent certain pairs of points from being contained in the same cluster?
(6) Will there be constraints on cluster size? For instance, in some applications (especially territory construction) you might want all clusters to have an (approximately) equal number of points.
(7) Do small improvements in clustering matter a lot, or would a rough approximate solution be ok?
(8) Will the problem be a dynamic one, with data constantly changing and the solution developing along with it, or is this a static situation where little is likely to change for a long time?
Because these considerations are so important, any omnibus solution suggested to you (such as the ubiquitous Thiessen polygon construction) will be a good one purely by accident. Finding precise answers to these questions first will suggest how to proceed. There do exist ArcView 3.x - based solutions, although usually they require the assistance of a DLL to do the heavy computational work.
--Bill Huber Quantitative Decisions
_______________________________________________ gislist mailing list gislist@lists.geocomm.com http://lists.geocomm.com/mailman/listinfo/gislist
_________________________________ This list is brought to you by The GeoCommunity http://www.geocomm.com/
Get Access to the latest GIS & Geospatial Industry RFPs and bids http://www.geobids.com
|
|

Sponsored by:

For information regarding advertising rates Click Here!
|