Mercurial Hosting > traffic-intelligence
view python/ml.py @ 190:36968a63efe1
Got the connected_components to finally work using a vecS for the vertex list in the adjacency list.
In this case, the component map is simply a vector of ints (which is the type of UndirectedGraph::vextex_descriptor (=graph_traits<FeatureGraph>::vertex_descriptor) and probably UndirectedGraph::vertices_size_type).
To use listS, I was told on the Boost mailing list:
>> If you truly need listS, you will need to create a vertex index
>> map, fill it in before you create the property map, and pass it to the
>> vector_property_map constructor (and as a type argument to that class).
It may be feasible with a component map like
shared_array_property_map< graph_traits<FeatureGraph>::vertex_descriptor, property_map<FeatureGraph, vertex_index_t>::const_type > components(num_vertices(g), get(vertex_index, g));
author | Nicolas Saunier <nicolas.saunier@polymtl.ca> |
---|---|
date | Wed, 07 Dec 2011 18:51:32 -0500 |
parents | d70e9b36889c |
children | 5957aa1d69e1 8bafd054cda4 |
line wrap: on
line source
#! /usr/bin/env python '''Libraries for machine learning algorithms''' __metaclass__ = type class Centroid: 'Wrapper around instances to add a counter' def __init__(self, instance, nInstances = 1): self.instance = instance self.nInstances = nInstances # def similar(instance2): # return self.instance.similar(instance2) def add(self, instance2): self.instance = self.instance.multiply(self.nInstances)+instance2 self.nInstances += 1 self.instance = self.instance.multiply(1/float(self.nInstances)) def average(c): inst = self.instance.multiply(self.nInstances)+c.instance.multiply(instance.nInstances) inst.multiply(1/(self.nInstances+instance.nInstances)) return Centroid(inst, self.nInstances+instance.nInstances) def draw(self, options = ''): from matplotlib.pylab import text self.instance.draw(options) text(self.instance.position.x+1, self.instance.position.y+1, str(self.nInstances)) def clustering(data, similar, initialCentroids = []): '''k-means algorithm with similarity function Two instances should be in the same cluster if the sameCluster function returns true for two instances. It is supposed that the average centroid of a set of instances can be computed, using the function. The number of clusters will be determined accordingly data: list of instances averageCentroid: ''' from random import shuffle from copy import copy, deepcopy localdata = copy(data) # shallow copy to avoid modifying data shuffle(localdata) if initialCentroids: centroids = deepcopy(initialCentroids) else: centroids = [Centroid(localdata[0])] for instance in localdata[1:]: i = 0 while i<len(centroids) and not similar(centroids[i].instance, instance): i += 1 if i == len(centroids): centroids.append(Centroid(instance)) else: centroids[i].add(instance) return centroids