List: Spark | Curated by Sai Parvathaneni

Feb 17, 2025
18 stories
3 saves
Spark
Sai Parvathaneni
Spark Optimization Techniques: Repartition() and Coalesce()Understanding Data Skewness in Apache Spark
Aug 27, 2024
Aug 27, 2024
Sai Parvathaneni
Spark Optimization Techniques: groupByKey() and reduceByKey()Understanding Shuffle in Apache Spark
Aug 26, 2024
2
Aug 26, 2024
2
Sai Parvathaneni
Spark Optimization Techniques: Broadcast() JoinA Broadcast Join in Apache Spark is an optimization technique that is used to improve the performance of joins involving a large dataset…
Aug 28, 2024
Aug 28, 2024
Sai Parvathaneni
Spark Optimization Techniques: Cache() and Persist()When working with Apache Spark, optimizing the performance of your Spark jobs is crucial, especially when dealing with large datasets and…
Aug 30, 2024
1
Aug 30, 2024
1
Sai Parvathaneni
Spark Optimization Techniques: Types of Joins — Visual RepresentationJoining datasets is a common operation in data processing, and Apache Spark provides several ways to perform joins efficiently. However…
Sep 1, 2024
Sep 1, 2024
Sai Parvathaneni
Mastering Spark Query Plans: Narrow TransformationsIn this article, we will explore narrow transformations in Apache Spark. We will go through the query plan produced by Spark and explain…
Sep 4, 2024
1
Sep 4, 2024
1
Sai Parvathaneni
Mastering Spark Query Plans: Wide Transformations — Repartition and CoalesceIn Spark, transformations can be categorized into narrow and wide transformations. While narrow transformations operate within the same…
Sep 6, 2024
Sep 6, 2024
Sai Parvathaneni
Mastering Spark Query Plans: Wide Transformations — JoinsIn Spark, joins are considered wide transformations because they require shuffling of data across the cluster. When you join two…
Sep 8, 2024
Sep 8, 2024
Sai Parvathaneni
Mastering Spark Query Plans: Wide Transformations — GroupBy AggregationsIn Spark, groupBy transformations are wide operations because they involve shuffling data across the cluster to group similar keys…
Sep 10, 2024
Sep 10, 2024
Sai Parvathaneni
Spark Optimization Techniques: The Role of SerializationIn distributed systems like Apache Spark, performance optimization is crucial for ensuring that data processing jobs are efficient…
Sep 10, 2024
Sep 10, 2024
In
Towards Dev
by
Sai Parvathaneni
Building a Streaming Data Pipeline: Spark vs. Flink Comparison with Kafka IntegrationWhen handling streams of data, two prominent frameworks that often come into play are Apache Spark and Apache Flink. Both are commonly used…
Sep 13, 2024
Sep 13, 2024
Sai Parvathaneni
Spark Optimization Techniques: Predicate PushdownApache Spark is a powerful tool for processing massive datasets, and part of what makes it so effective is its ability to scale and perform…
Sep 15, 2024
Sep 15, 2024
In
Towards Dev
by
Sai Parvathaneni
Apache Spark for Dummies: Part 1 Architecture and RDDsThis series will be your ultimate guide to Apache Spark, I promise.
May 19, 2023
May 19, 2023
Sai Parvathaneni
Short Reads: SparkContext and SparkSession: Pizza AnalogyLet’s consider SparkContext and SparkSession. Just as you would handle pizza, from making the dough, adding toppings, to finally baking it…
Jun 2, 2023
Jun 2, 2023
In
Towards Dev
by
Sai Parvathaneni
Apache Spark for Dummies Part 2: DataFrames, Datasets, and Spark SQLWelcome back to our exploration into Apache Spark! In Part 1 of our series, we delved into the basics of Spark and got our hands dirty with…
May 29, 2023
2
May 29, 2023
2
In
Towards Dev
by
Sai Parvathaneni
Building a Real-time Log Monitoring System with Kafka and Spark StreamingBy leveraging Kafka for data streaming and integration, and Spark for real-time data processing and analysis, organizations can achieve a…
Jun 6, 2023
1
Jun 6, 2023
1
Sai Parvathaneni
Apache Spark for Dummies Part 3: Data Processing and AnalysisWelcome back to the third part of our series on Apache Spark. We’ve discussed Apache Spark’s architecture, RDDs, DataFrames, and Datasets…
Jun 11, 2023
Jun 11, 2023
Sai Parvathaneni
Apache Spark for Dummies: Part 4 — Advanced Spark FeaturesWelcome to the fourth part of our Apache Spark series. In this segment, we will explore some of the advanced features of Apache Spark that…
Jun 17, 2023
Jun 17, 2023