首页 文章

Networkx - 如何在显示节点ID而不是标签的节点之间获得最短路径长度

提问于
浏览
2

我是使用Python的NetworkX库的新手 .

假设我导入一个Pajek格式的文件:

import networkx as nx
G=nx.read_pajek("pajek_network_file.net")
G=nx.Graph(G)

我文件的内容是(在Pajek中,节点称为“顶点”):

*Network
*Vertices 6
123 Author1
456 Author2
789 Author3
111 Author4
222 Author5
333 Author6
*Edges 
123 333
333 789
789 222
222 111
111 456

现在,我想计算网络中节点之间的所有最短路径长度,并且我正在使用此函数,根据库文档

path = nx.all_pairs_shortest_path_length(G)

返回:length - 由源和目标键控的最短路径长度的字典 .

我得到的回报:

print path
{u'Author4': {u'Author4': 0, u'Author5': 1, u'Author6': 3, u'Author1': 4, u'Author2': 1, u'Author3': 2}, u'Author5': {u'Author4': 1, u'Author5': 0, u'Author6': 2, u'Author1': 3, u'Author2': 2, u'Author3': 1}, u'Author6': {u'Author4': 3, u'Author5': 2, u'Author6': 0, u'Author1': 1, u'Author2': 4, u'Author3': 1}, u'Author1': {u'Author4': 4, u'Author5': 3, u'Author6': 1, u'Author1': 0, u'Author2': 5, u'Author3': 2}, u'Author2': {u'Author4': 1, u'Author5': 2, u'Author6': 4, u'Author1': 5, u'Author2': 0, u'Author3': 3}, u'Author3': {u'Author4': 2, u'Author5': 1, u'Author6': 1, u'Author1': 2, u'Author2': 3, u'Author3': 0}}

正如你所看到的,它真的很难阅读,以后再用......

理想情况下,我想要的是一个类似于下面的格式的返回:

source_node_id, target_node_id, path_length
123, 456, 5
123, 789, 2
123, 111, 4

简而言之,我需要仅使用(或至少包括)节点ID来获得返回,而不是仅显示节点标签 . 而且,要将每一个可能的对与一条相应的最短路径组合在一条线上......

这在NetworkX中可行吗?

功能参考:https://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.shortest_paths.unweighted.all_pairs_shortest_path_length.html

2 回答

  • 0

    最后,我只需要计算整个网络子集的最短路径(我的实际网络很大,有600K节点和6M边缘),所以我编写了一个从CSV读取源节点和目标节点对的脚本文件,存储到numpy数组,然后将它们作为参数传递给nx.shortest_path_length并计算每对,最后将结果保存到CSV文件 .

    代码如下,我发布它是为了防止对那里的人有用:

    print "Importing libraries..."
    
    import networkx as nx
    import csv
    import numpy as np
    
    #Import network in Pajek format .net
    myG=nx.read_pajek("MyNetwork_0711_onlylabel.net")
    
    print "Finished importing Network Pajek file"
    
    #Simplify graph into networkx format
    G=nx.Graph(myG)
    
    print "Finished converting to Networkx format"
    
    #Network info
    print "Nodes found: ",G.number_of_nodes()
    print "Edges found: ",G.number_of_edges()
    
    
    #Reading file and storing to array
    with open('paired_nodes.csv','rb') as csvfile:
        reader = csv.reader(csvfile, delimiter = ',', quoting=csv.QUOTE_MINIMAL)#, quotechar = '"')
        data = [data for data in reader]
    paired_nodes = np.asarray(data)
    paired_nodes.astype(int)
    
    print "Finished reading paired nodes file"
    
    #Add extra column in array to store shortest path value
    paired_nodes = np.append(paired_nodes,np.zeros([len(paired_nodes),1],dtype=np.int),1)
    
    print "Just appended new column to paired nodes array"
    
    #Get shortest path for every pair of nodes
    
    for index in range(len(paired_nodes)):
        try:
        shortest=nx.shortest_path_length(G,paired_nodes[index,0],paired_nodes[index,1])
            #print shortest
            paired_nodes[index,2] = shortest
        except nx.NetworkXNoPath:
            #print '99999'  #Value to print when no path is found
            paired_nodes[index,2] = 99999
    
    print "Finished calculating shortest path for paired nodes"
    
    #Store results to csv file      
    f = open('shortest_path_results.csv','w')
    
    for item in paired_nodes:
        f.write(','.join(map(str,item)))
        f.write('\n')
    f.close()
    
    print "Done writing file with results, bye!"
    
  • 1

    这样的事怎么样?

    import networkx as nx                                                            
    G=nx.read_pajek("pajek_network_file.net")                                        
    G=nx.Graph(G)
    # first get all the lengths      
    path_lengths = nx.all_pairs_shortest_path_length(G)                              
    
    # now iterate over all pairs of nodes      
    for src in G.nodes():
        # look up the id as desired                           
        id_src = G.node[src].get('id')
        for dest in G.nodes():                                                       
            if src != dest: # ignore self-self paths
                id_dest =  G.node[dest].get('id')                                    
                l = path_lengths.get(src).get(dest)                                  
                print "{}, {}, {}".format(id_src, id_dest, l)
    

    这产生了输出

    111, 222, 1
    111, 333, 3
    111, 123, 4
    111, 456, 1
    111, 789, 2
    ...
    

    如果您需要进行进一步处理(例如排序),则存储 l 值而不是仅打印它们 .

    (你可以用itertools.combinations( G.nodes(), 2) 之类的东西更干净地循环对,但如果你不熟悉它,上面的方法会更明确一些 . )

相关问题