Skip to content

Commit

Permalink
Data read modified to work with the new pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
lim185 committed Oct 1, 2025
1 parent a94fffa commit f4418e2
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,12 @@ def pca(data, features):

if __name__ == "__main__":
spark = SparkSession.builder.appName("train").getOrCreate()
data = load(spark, split=[0.9, 0.5, 0.5])
SPLIT = [0.9, 0.05, 0.05]
load_from_scratch = False
if load_from_scratch:
data = etl(spark, split=SPLIT)
else:
data = read(spark)

pca(data)
category_distribution(data)
Expand Down

0 comments on commit f4418e2

Please sign in to comment.