Cookbook model averaging question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Cookbook model averaging question

topinsky
Hello. 

I've got a question about the following example from cookbook
https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#id11

I work on my own network in exact same way, trying to perform sampling to estimate 
some metrics. But the results are in some way replicates the behaviour from the cookbook example: 
For both cases (simple and nested SBM) the marginal distributions for vertices most of the times has too many non-zero values for different clusters, hence the colouring is so fine granular. Only few (1-2) clusters obey some explicit dominant group membership. But the rest of clusters exhibit very distributed marginals. 
Do you have any explanation for this? 
In case of my network I also have only 1-3 groups of nodes with some explicit dominant group membership. And the rest of vertices has too many non-zero, almost uniformly distributed marginals. I was thinking that for the simple cookbook example it's not natural that some vertices has more than 10 non-zero marginal values. 
May be it's just the result of independent launches of mcmc algorithm and random nature of groups labelling? Or there is some intuition behind this high marginal variance in group membership? 
I launched several times the optimisation, and drew the results. Topologically the outputs were very close to each other, although colouring was always different except a few kind of "stable" vertices. Hence, I guess, the resulted marginals for them have the same properties. But labels are not informative  it selves. May be there is some trick how to force some deterministic labelling policies to stabilise it ? 

Thank you 
Valery. 

 

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cookbook model averaging question

Tiago Peixoto
Administrator
On 08.08.2017 01:10, Valery Topinsky wrote:

>
> I work on my own network in exact same way, trying to perform sampling to
> estimate
> some metrics. But the results are in some way replicates the behaviour from
> the cookbook example:
> For both cases (simple and nested SBM) the marginal distributions for
> vertices most of the times has too many non-zero values for different
> clusters, hence the colouring is so fine granular. Only few (1-2) clusters
> obey some explicit dominant group membership. But the rest of clusters
> exhibit very distributed marginals.
> Do you have any explanation for this?
This means that the posterior distribution is broad, i.e. not concentrated
on any particular distribution. This implies either that the model is
mispecified, i.e. your network does not have well-defined groups, or that it
is very noisy.

> In case of my network I also have only 1-3 groups of nodes with some
> explicit dominant group membership. And the rest of vertices has too many
> non-zero, almost uniformly distributed marginals. I was thinking that for
> the simple cookbook example it's not natural that some vertices has more
> than 10 non-zero marginal values.
> May be it's just the result of independent launches of mcmc algorithm and
> random nature of groups labelling? Or there is some intuition behind this
> high marginal variance in group membership?
> I launched several times the optimisation, and drew the results.
> Topologically the outputs were very close to each other, although colouring
> was always different except a few kind of "stable" vertices. Hence, I guess,
> the resulted marginals for them have the same properties. But labels are not
> informative  it selves. May be there is some trick how to force some
> deterministic labelling policies to stabilise it ?
There is no trick; this variance in the posterior reflects the nature of
your data. You if you want a single partition to represent it, you have to
choose between two extremes of the bias-variance trade-off:

   1. Choose the most likely partition, i.e. the one that minimizes the
      description length. (more bias, less variance)

   2. Choose the maximum a posteriori estimate for each node, i.e., the
      most likely node label according to the node marginals. (less bias,
      more variance)

Option 2 averages over the noise, but might not be representative of any
particular fit (specially if the number of groups is fluctuating). Option 1
usually underfits, but may also overfit, depending on your data.

There is a discussion on this here: https://arxiv.org/abs/1705.10225

Best,
Tiago


--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cookbook model averaging question

topinsky
Good day,
Thank you for the reply.

I want to demonstrate the relevant confusing observation.
I ran the example from cookbook:

g = collection.data["lesmis"]
state = BlockState(g, B=20)  
state = state.copy(B=g.num_vertices())
mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))

pv = None
h = np.zeros(g.num_vertices() + 1)
    
def collect_marginals(s):
   global pv
   global h
   B = s.get_nonempty_B()
   h[B] += 1
   pv = s.collect_vertex_marginals(pv)

mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
                    callback=collect_marginals)

state.draw(pos=g.vp.pos, vertex_shape="pie", vertex_pie_fractions=pv,
           edge_gradient=None, )
# print histogram of nonempty B values 
_ =h.nonzero()[0]
plt.bar(_, h[_])

I attached the plots. As you can see the model always use a few (nonempty) blocks from 6 to 9.
But at the same time amount of different marginal states (with positive probabilities)
for some vertices are around 70 (almost the all potential 77 = g.num_vertices()).
Which means that during independent runs model can get new set of 6 to 9 blocks but
just with some other labels of it. This is what I meant by:
"May be it's just the result of independent launches of mcmc algorithm and
random nature of groups labelling?"

Is there any way how to do sampling without specifying exact B?
But rather with sampling of B as it described in https://arxiv.org/pdf/1705.10225.pdf Ch. IV. ?

Vertex Marginals
Histogram of NonEmpty Block amount
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Cookbook model averaging question

Tiago Peixoto
Administrator
On 09.08.2017 18:04, topinsky wrote:

> I attached the plots. As you can see the model always use a few (nonempty)
> blocks from 6 to 9.
> But at the same time amount of different marginal states (with positive
> probabilities)
> for some vertices are around 70 (almost the all potential 77 =
> g.num_vertices()).
> Which means that during independent runs model can get new set of 6 to 9
> blocks but
> just with some other labels of it. This is what I meant by:
> "May be it's just the result of independent launches of mcmc algorithm and
> random nature of groups labelling?"
Oh, the actual vertex labels are not meaningful. You can just re-label them
in a contiguous range before computing the histogram.

> Is there any way how to do sampling without specifying exact B?
> But rather with sampling of B as it described in
> https://arxiv.org/pdf/1705.10225.pdf Ch. IV. ?

This is exactly what happens; this is why your histogram has many values of
non-empty groups.

(The number of total groups, including empty ones, will always grow as
necessary.)

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>




_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Loading...