I'm new to Spark and Scala, and I am trying to read a bunch of tweeter data from a JSON file and turn that into a graph where a vertex represents a tweet and the edge connects to tweets which are a re-tweet of the original posted item. So far I have managed to read from the JSON file and figure out the Schema of my RDD. Now I believe I need to somehow take the data from the SchemaRDD object and create an RDD for the Vertices and an RDD for the edges. Is this the way to approach this or is there an alternative solution? Any help and suggestions would be highly appreciated.
Spark GraphX - How can I read from a JSON file in Spark and create a graph from the data?
1.2k Views Asked by Adelina Balasa At
1
There are 1 best solutions below
Related Questions in GRAPH
- Querying Office for National Statistics data using SPARQL
- Which mathematical algorithm is used for interpolation between datapoints in Smooth Line Chart of Echart?
- how can I use coordinates of path walked by multiple subjects
- Creating a Graph/Chart needing TWO secondary axis options for a combination of Clustered and Stacked Graph Columns
- How to stretch specific y axis intervals so the space between some values is larger than between others?
- out of order time points on multi line chart
- What does negative flow on a reverse arc of a graph in Boykov-Kolmogorov max flow algorithm mean?
- how to generate {8,3} regular graphs for large number of vertices
- Why can't I apply ModularityState from graph-tool on a graph in XML format?
- Update Node from OneTBB Library
- Find the smallest set of vertices in a graph such that you can still reach any point in the set when any single vertex is removed
- Graph Neural Network Custom Data
- FIFO-property in graphs
- How to display total count of bars for each group in Google Charts on the right side of the graph or in legend position
- Whats wrong on Graph API permission for selected site
Related Questions in APACHE-SPARK
- Getting error while running spark-shell on my system; pyspark is running fine
- ingesting high volume small size files in azure databricks
- Spark load all partions at once
- Databricks Delta table / Compute job
- Autocomplete not working for apache spark in java vscode
- How to overwrite a single partition in Snowflake when using Spark connector
- Parse multiple record type fixedlength file with beanio gives oom and timeout error for 10GB data file
- includeExistingFiles: false does not work in Databricks Autoloader
- Spark connectors from Azure Databricks to Snowflake using AzureAD login
- SparkException: Task failed while writing rows, caused by Futures timed out
- Configuring Apache Spark's MemoryStream to simulate Kafka stream
- Databricks can't find a csv file inside a wheel I installed when running from a Databricks Notebook
- Add unique id to rows in batches in Pyspark dataframe
- Does Spark Dynamic Allocation depend on external shuffle service to work well?
- Does Spark structured streaming support chained flatMapGroupsWithState by different key?
Related Questions in RDD
- spark - How is it even possible to get an OOM?
- Dataframe value replacement
- Regex expression to avoid '' records in a RDD after splitting the text
- Spark Left Outer Join produces Optional.empty when it shouldn't
- Converting RDD-based flattening logic to DataFrame approach in PySpark
- What is the memory layout of a non-HDFS RDD?
- I see the following error when running the "saveastextfile" function for RDD using pyspark
- How does RDD.aggregate() work with partitions?
- Fetch a column value into a variable in pyspark without collect
- How to find common pairs irrespective of their order in Pyspark RDD?
- How can i save data from hdfs to amazon s3
- How to do conversion of pyspark RDD to dataframe?
- How to convert pyspark df to python string?
- removing , and converting to int
- Getting Job aborted due to stage failure while converting my string data in a pyspark dataframe into a dictionary
Related Questions in SPARK-GRAPHX
- Find the top level node for each node in a tree structure using graphx pregel API
- GraphX to create parent-child linkage in pyspark dataframe
- Pregel for Dynamic Graph Processing and Graph Streams?
- which library in pyspark implements graphx api
- Cannot resolve overloaded method 'createDataFrame' in scala code
- Issue in Connected Component GraphX - Memory Issue
- Convert a Spark DataFrame containing an embedded list into an RDD in Scala
- Advice on how to use GraphX (use-case in the description below)
- Spark GraphX stronglyConnectedComponents crashing with StackOverflow
- Making a Graph in Spark GraphX using Java bindings: What are those "evidence" arguments?
- Spark Graphx deadlock when execute my program in mutilple process
- Distributing the GraphX Pregel API to a Spark standalone cluster
- GraphX - assign to each vertex of a graph the IDs of all connected elements satisfying a certain property
- GraphX ShortestPaths from org.apache.spark.graphx.lib.ShortestPaths
- spark aggregateMessages tree data sum value of all node
Related Questions in PROPERTY-GRAPH
- Auto-update Property Graph (view or table) from original table
- Watch updates in redis graph
- Data model for Graph database AWS neptune
- Cypher query in a neo4j graph
- What is best way to implement multi-lingual / internationalization in Graph database?
- What is the best way to perform soft delete in Graph database?
- Neo4j - encountering FetchURLError whenever using any query with GET or POST
- Are there any collaborative data modelling tools for a property graph model that can import into neo4j
- Error while creating node using py2neo : keyword can't be an expression
- Litte expressivity of Property Graphs compared to RDF Graphs due to lack of reification?
- Is it possible to create a property graph in NetworKit?
- Can a node have a property twice?
- Can I define different attributes for the same values in two different nodes for the same label?
- Why aren't TripleStore implemented as Native Graph Store as Property-Graph Store are?
- Using GraphScale over neo4j LPG for reasoning
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
This really depends on your json file. You need to parse the data from the json file and create your vertices and edges based on the parsed data. There isnt a certain way to implement this, its really up to the programmer. One approach is to create a vertices array and edges array (again based on the parsed data) and parallelize those (create VertexRDD and EdgeRDD), and then create the graph you need. Hope I helped.