Hello. I've got a question about the following example from cookbook https://graphtool.skewed.de/static/doc/demos/inference/inference.html#id11I work on my own network in exact same way, trying to perform sampling to estimate some metrics. But the results are in some way replicates the behaviour from the cookbook example: For both cases (simple and nested SBM) the marginal distributions for vertices most of the times has too many nonzero values for different clusters, hence the colouring is so fine granular. Only few (12) clusters obey some explicit dominant group membership. But the rest of clusters exhibit very distributed marginals. Do you have any explanation for this? In case of my network I also have only 13 groups of nodes with some explicit dominant group membership. And the rest of vertices has too many nonzero, almost uniformly distributed marginals. I was thinking that for the simple cookbook example it's not natural that some vertices has more than 10 nonzero marginal values. May be it's just the result of independent launches of mcmc algorithm and random nature of groups labelling? Or there is some intuition behind this high marginal variance in group membership? I launched several times the optimisation, and drew the results. Topologically the outputs were very close to each other, although colouring was always different except a few kind of "stable" vertices. Hence, I guess, the resulted marginals for them have the same properties. But labels are not informative it selves. May be there is some trick how to force some deterministic labelling policies to stabilise it ? Thank you Valery. _______________________________________________ graphtool mailing list [hidden email] https://lists.skewed.de/mailman/listinfo/graphtool 
Administrator

On 08.08.2017 01:10, Valery Topinsky wrote:
> > I work on my own network in exact same way, trying to perform sampling to > estimate > some metrics. But the results are in some way replicates the behaviour from > the cookbook example: > For both cases (simple and nested SBM) the marginal distributions for > vertices most of the times has too many nonzero values for different > clusters, hence the colouring is so fine granular. Only few (12) clusters > obey some explicit dominant group membership. But the rest of clusters > exhibit very distributed marginals. > Do you have any explanation for this? on any particular distribution. This implies either that the model is mispecified, i.e. your network does not have welldefined groups, or that it is very noisy. > In case of my network I also have only 13 groups of nodes with some > explicit dominant group membership. And the rest of vertices has too many > nonzero, almost uniformly distributed marginals. I was thinking that for > the simple cookbook example it's not natural that some vertices has more > than 10 nonzero marginal values. > May be it's just the result of independent launches of mcmc algorithm and > random nature of groups labelling? Or there is some intuition behind this > high marginal variance in group membership? > I launched several times the optimisation, and drew the results. > Topologically the outputs were very close to each other, although colouring > was always different except a few kind of "stable" vertices. Hence, I guess, > the resulted marginals for them have the same properties. But labels are not > informative it selves. May be there is some trick how to force some > deterministic labelling policies to stabilise it ? your data. You if you want a single partition to represent it, you have to choose between two extremes of the biasvariance tradeoff: 1. Choose the most likely partition, i.e. the one that minimizes the description length. (more bias, less variance) 2. Choose the maximum a posteriori estimate for each node, i.e., the most likely node label according to the node marginals. (less bias, more variance) Option 2 averages over the noise, but might not be representative of any particular fit (specially if the number of groups is fluctuating). Option 1 usually underfits, but may also overfit, depending on your data. There is a discussion on this here: https://arxiv.org/abs/1705.10225 Best, Tiago  Tiago de Paula Peixoto <[hidden email]> _______________________________________________ graphtool mailing list [hidden email] https://lists.skewed.de/mailman/listinfo/graphtool signature.asc (849 bytes) Download Attachment

Tiago de Paula Peixoto <tiago@skewed.de> 
Good day,
Thank you for the reply. I want to demonstrate the relevant confusing observation. I ran the example from cookbook: g = collection.data["lesmis"] state = BlockState(g, B=20) state = state.copy(B=g.num_vertices()) mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10)) pv = None h = np.zeros(g.num_vertices() + 1) def collect_marginals(s): global pv global h B = s.get_nonempty_B() h[B] += 1 pv = s.collect_vertex_marginals(pv) mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), callback=collect_marginals) state.draw(pos=g.vp.pos, vertex_shape="pie", vertex_pie_fractions=pv, edge_gradient=None, ) # print histogram of nonempty B values _ =h.nonzero()[0] plt.bar(_, h[_]) I attached the plots. As you can see the model always use a few (nonempty) blocks from 6 to 9. But at the same time amount of different marginal states (with positive probabilities) for some vertices are around 70 (almost the all potential 77 = g.num_vertices()). Which means that during independent runs model can get new set of 6 to 9 blocks but just with some other labels of it. This is what I meant by: "May be it's just the result of independent launches of mcmc algorithm and random nature of groups labelling?" Is there any way how to do sampling without specifying exact B? But rather with sampling of B as it described in https://arxiv.org/pdf/1705.10225.pdf Ch. IV. ? 
Administrator

On 09.08.2017 18:04, topinsky wrote:
> I attached the plots. As you can see the model always use a few (nonempty) > blocks from 6 to 9. > But at the same time amount of different marginal states (with positive > probabilities) > for some vertices are around 70 (almost the all potential 77 = > g.num_vertices()). > Which means that during independent runs model can get new set of 6 to 9 > blocks but > just with some other labels of it. This is what I meant by: > "May be it's just the result of independent launches of mcmc algorithm and > random nature of groups labelling?" in a contiguous range before computing the histogram. > Is there any way how to do sampling without specifying exact B? > But rather with sampling of B as it described in > https://arxiv.org/pdf/1705.10225.pdf Ch. IV. ? This is exactly what happens; this is why your histogram has many values of nonempty groups. (The number of total groups, including empty ones, will always grow as necessary.) Best, Tiago  Tiago de Paula Peixoto <[hidden email]> _______________________________________________ graphtool mailing list [hidden email] https://lists.skewed.de/mailman/listinfo/graphtool signature.asc (849 bytes) Download Attachment

Tiago de Paula Peixoto <tiago@skewed.de> 
Free forum by Nabble  Edit this page 