Skip to content

Commit

Permalink
Splitting dataset into train, validation and test sets
Browse files Browse the repository at this point in the history
  • Loading branch information
lim185 committed Dec 8, 2024
1 parent 0d2aabc commit c6d8a75
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@ def main():
spark = SparkSession.builder.appName("train").getOrCreate()
pipe = SpectrogramPipe(spark)
data = pipe.spectrogram_pipe(path, labels)
print(data.head())
train_df, validation_df, test_df = data.randomsplit([0.8, 0.1, 0.1],
seed=42)
print(train_df.count())
print(validation_df.count())
print(test_df.count())

return

Expand Down

0 comments on commit c6d8a75

Please sign in to comment.