first commit

2025-05-23 14:45:22 +02:00 · 2025-05-23 14:45:22 +02:00 · b6d47982c7
commit b6d47982c7
8 changed files with 1353 additions and 0 deletions
--- a/.env.sample
+++ b/.env.sample
@ -0,0 +1,28 @@
 # Whisper Parallel Configuration
 # SSH Key Configuration
 KEY_NAME=whisper-key
 KEY_FILE=$HOME/.ssh/whisper-key.pem
 SECURITY_GROUP=whisper-sg
 # AWS Instance Configuration
 INSTANCE_TYPE=g4dn.12xlarge
 REGION=eu-south-1
 AMI_ID=ami-059603706d3734615
 # Video/Audio Processing
 VIDEO_FILE=mio_video.mp4
 START_MIN=0
 END_MIN=0
 SHIFT_SECONDS=0
 SHIFT_ONLY=false
 INPUT_PREFIX=
 # GPU Configuration
 GPU_COUNT=1
 # Processing Options
 NUM_SPEAKERS=
 FIX_START=true
 # API Tokens
 HF_TOKEN=your_huggingface_token_here
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,53 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 # Environment variables
 .env
 .env.local
 .env.development.local
 .env.test.local
 .env.production.local
 # Processed files
 *.wav
 *.mp3
 *.mp4
 *.srt
 *.vtt
 transcript*.*
 !transcription-runner/mio_video.mp4
 # Logs
 *.log
 # OS specific
 .DS_Store
 ._.DS_Store
 *.swp
 *.swo
 # IDE
 .idea/
 .vscode/
 *.sublime-project
 *.sublime-workspace
 .ropeproject/
--- a/README.md
+++ b/README.md
@ -0,0 +1,215 @@
 # Transcription Runner con Multi-Chunk Processing e GPU Parallela
 Questo pacchetto ti consente di:
 - Creare un'istanza EC2 GPU su AWS (g4dn.12xlarge)
 - Suddividere e trascrivere un file video `.mp4` in più chunk
 - Generare automaticamente transcript + speaker diarization
 - Scaricare i file di output
 - Terminare l'istanza per risparmiare costi
 - Applicare spostamento temporale ai timestamp delle trascrizioni
 - Configurare facilmente le opzioni tramite file `.env`
 ---
 ## ✅ Prerequisiti
 ### 1. **Installare AWS CLI**
 Se non hai ancora installato AWS CLI:
 - Su macOS con Homebrew:
 ```bash
 brew install awscli
 ```
 - Su Linux (Debian/Ubuntu):
 ```bash
 sudo apt update
 sudo apt install awscli
 ```
 ### 2. **Configurare AWS CLI**
 Una volta installato, esegui:
 ```bash
 aws configure
 ```
 Inserisci:
 - Access key ID
 - Secret access key
 - Regione predefinita (es: `eu-south-1`)
 - Formato output: `json`
 ### 3. **Creare una chiave SSH per EC2**
 Nel terminale, esegui:
 ```bash
 aws ec2 create-key-pair --key-name whisper-key --query 'KeyMaterial' --output text > ~/.ssh/whisper-key.pem
 chmod 400 ~/.ssh/whisper-key.pem
 ```
 ### 4. **Installa netcat**
 - Su macOS con Homebrew:
 ```bash
 brew install netcat
 ```
 - Su Linux (Debian/Ubuntu):
 ```bash
 sudo apt install netcat
 ```
 ### 5. **Registrarsi su Hugging Face e ottenere token**
 Vai su: https://huggingface.co/settings/tokens
 Crea un token con accesso ai modelli (read access) e copia il valore.
 ### 6. **IAM role "WhisperS3Profile" con accesso S3**
 Assicurati che il tuo account AWS abbia un ruolo IAM chiamato "WhisperS3Profile" con permessi di accesso S3.
 ### 7. **Configurare il file .env**
 Copia il file `.env.sample` in `.env` e modifica i valori secondo le tue esigenze:
 ```bash
 cp .env.sample .env
 nano .env  # o usa l'editor che preferisci
 ```
 ---
 ## ▶️ Come usare
 ### Metodo Base
 ```bash
 chmod +x whisper_parallel.sh
 ./whisper_parallel.sh
 ```
 ### Configurazione tramite file .env
 Modifica il file `.env` con i tuoi parametri e poi esegui:
 ```bash
 ./whisper_parallel.sh
 ```
 ### Specificare i parametri tramite variabili d'ambiente (sovrascrive .env)
 ```bash
 VIDEO_FILE="mia_intervista.mp4" START_MIN=5 END_MIN=15 GPU_COUNT=4 ./whisper_parallel.sh
 ```
 ### Parametri disponibili
 Questi parametri possono essere specificati nel file `.env` o tramite variabili d'ambiente:
 | Parametro | Descrizione | Default |
 |-----------|-------------|---------|
 | VIDEO_FILE | Il file video/audio da trascrivere | mio_video.mp4 |
 | START_MIN | Minuto di inizio per il crop | 0 |
 | END_MIN | Minuto di fine per il crop | 0 (fino alla fine) |
 | SHIFT_SECONDS | Sposta i timestamp di X secondi | 0 |
 | GPU_COUNT | Numero di chunk in cui dividere l'audio | 4 |
 | NUM_SPEAKERS | Numero di speaker se conosciuto in anticipo | (auto) |
 | DIARIZATION_ENABLED | Attiva/disattiva riconoscimento speaker | true |
 | INSTANCE_TYPE | Tipo di istanza EC2 | g4dn.12xlarge |
 | REGION | Regione AWS | eu-south-1 |
 | BUCKET_NAME | Nome del bucket S3 | whisper-video-transcripts |
 | HF_TOKEN | Token Hugging Face per Pyannote | (richiesto) |
 | FIX_START | Aggiunge silenzio all'inizio per migliorare la cattura | true |
 | SHIFT_ONLY | Applica solo lo spostamento timestamp a file esistenti | false |
 | INPUT_PREFIX | Prefisso per i file di input quando si usa SHIFT_ONLY | "" |
 | WHISPER_MODEL | Modello Whisper da utilizzare | large |
 ---
 ## 📦 Output
 Al termine troverai questi file nella cartella corrente:
 - `{nome-file}_{start}_{end}_{random}.txt` → transcript grezzo
 - `{nome-file}_{start}_{end}_{random}_final.txt` → transcript con speaker
 - `{nome-file}_{start}_{end}_{random}.srt` → file SRT per i sottotitoli
 - `{nome-file}_{start}_{end}_{random}.vtt` → file VTT per i sottotitoli web
 ---
 ## 🚀 Modalità Multi-Chunk
 La versione attuale dello script divide automaticamente l'audio in più parti e le elabora in parallelo su GPU. Questo:
 1. Migliora l'utilizzo della memoria per file lunghi
 2. Accelera il processo di trascrizione di file estesi
 3. Ottimizza l'utilizzo delle risorse hardware
 ### Suggerimenti per le prestazioni
 1. **Instanza ideale**: g4dn.xlarge è sufficiente per file brevi, g4dn.12xlarge per file lunghi con multi-GPU
 2. **Numero di chunk**: Per file lunghi, suddividere in più chunk aiuta a gestire meglio la memoria
 3. **Modello**: Per file molto lunghi, considerare l'uso del modello "medium" o "base" invece di "large"
 ---
 ## 🧪 Esempi di utilizzo
 ### Configurazione tramite .env
 Modifica il file `.env` con i tuoi parametri e poi esegui:
 ```bash
 ./whisper_parallel.sh
 ```
 ### Trascrivere un intero file
 ```bash
 VIDEO_FILE="conferenza.mp4" ./whisper_parallel.sh
 ```
 ### Trascrivere una porzione specifica
 ```bash
 VIDEO_FILE="lezione.mp4" START_MIN=10 END_MIN=20 ./whisper_parallel.sh
 ```
 ### Suddividere un file lungo in più chunk
 ```bash
 VIDEO_FILE="intervista.mp4" GPU_COUNT=6 ./whisper_parallel.sh
 ```
 ### Disabilitare la diarizzazione (solo trascrizione)
 ```bash
 VIDEO_FILE="audio.mp4" DIARIZATION_ENABLED=false ./whisper_parallel.sh
 ```
 ### Specificare il numero di speaker
 ```bash
 VIDEO_FILE="intervista.mp4" NUM_SPEAKERS=2 ./whisper_parallel.sh
 ```
 ### Spostare i timestamp di una trascrizione esistente
 ```bash
 SHIFT_ONLY=true SHIFT_SECONDS=30 INPUT_PREFIX="mia_trascrizione" ./whisper_parallel.sh
 ```
 ---
 ## 🔄 Funzionalità avanzate
 ### Spostamento dei timestamp
 Lo script può spostare i timestamp nei file di trascrizione, utile quando:
 - Hai tagliato una porzione iniziale del video
 - Devi sincronizzare i sottotitoli con un video modificato
 - Lavori con segmenti estratti da un video più lungo
 ### Tipi di file supportati per lo spostamento
 - `.srt` (SubRip Text)
 - `.vtt` (WebVTT)
 - `.txt` (Transcript con timestamp)
 ---
 ## ☁️ Note
 - L'istanza EC2 viene **distrutta automaticamente** al termine.
 - I file audio vengono rimossi dal bucket S3 dopo il download.
 - I nomi dei file di output includono un suffisso casuale per evitare conflitti.
 - In caso di interruzione dello script, il sistema eseguirà comunque la pulizia delle risorse AWS.
 - Richiede il file companion `parallel_transcript.py` per l'elaborazione su EC2.
 ## Dettagli tecnici
 - Utilizza FFmpeg per l'estrazione audio
 - Crea automaticamente security group AWS e utilizza la VPC predefinita se disponibile
 - Implementa un cleanup automatico alla terminazione dello script
 - Supporta diarizzazione di alta qualità tramite Pyannote/WhisperX
 - Fornisce funzionalità di spostamento timestamp per tutti i formati di output
 ## Sicurezza
 - Lo script crea un security group che consente l'accesso SSH da qualsiasi IP (0.0.0.0/0)
 - Sono necessarie credenziali AWS con permessi EC2 e S3
 - Le chiavi SSH vengono utilizzate per l'accesso sicuro all'istanza
 - Il file `.env` contiene dati sensibili e non dovrebbe essere aggiunto al controllo di versione (è già incluso in `.gitignore`)
--- a/parallel_transcript.py
+++ b/parallel_transcript.py
@ -0,0 +1,398 @@
 import os
 import argparse
 import whisper
 import torch
 import time
 import threading
 import json
 from pyannote.audio import Pipeline
 from datetime import timedelta
 import numpy as np
 from pydub import AudioSegment
 import math
 from dotenv import load_dotenv
 # Load environment variables from .env file
 load_dotenv()
 def start_spinner():
    def spin():
        while not spinner_done:
            print(".", end="", flush=True)
            time.sleep(1)
    global spinner_done
    spinner_done = False
    t = threading.Thread(target=spin)
    t.start()
    return t
 def stop_spinner(thread):
    global spinner_done
    spinner_done = True
    thread.join()
    print("")
 def extend_audio_beginning(input_audio, output_audio, silence_duration=0):
    """Aggiunge un breve silenzio all'inizio dell'audio per catturare meglio i primi secondi"""
    print(f"🔄 Aggiungendo {silence_duration/1000} secondi di silenzio all'inizio dell'audio...")
    audio = AudioSegment.from_file(input_audio)
    silence = AudioSegment.silent(duration=silence_duration)  # 2 secondi di silenzio
    extended_audio = silence + audio
    extended_audio.export(output_audio, format="wav")
    print(f"✅ Audio esteso salvato come {output_audio}")
    return output_audio
 def transcribe_audio(audio_path, model_size="large"):
    """Trascrivi l'audio con Whisper usando impostazioni ottimizzate"""
    print(f"🔹 Trascrizione con Whisper ({model_size})...")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"⚙️ Uso dispositivo: {device.upper()}")
    model = whisper.load_model(model_size).to(device)
    # Impostazioni avanzate per migliorare il rilevamento del discorso
    options = {
        "language": "it",
        "condition_on_previous_text": True,  # Migliora la coerenza tra segmenti
        "suppress_tokens": [-1],  # Sopprime i tokens di silenzio
        "initial_prompt": "Trascrizione di una conversazione tra tre persone."  # Contestualizza
    }
    # Verifica se la versione di Whisper supporta word_timestamps
    try:
        test_options = options.copy()
        test_options["word_timestamps"] = True
        whisper.transcribe(audio_path, **test_options)
        options["word_timestamps"] = True
        print("✅ Utilizzo timestamp a livello di parola")
    except:
        print("⚠️ Questa versione di Whisper non supporta i timestamp a livello di parola")
    spinner = start_spinner()
    start = time.time()
    result = model.transcribe(audio_path, **options)
    stop_spinner(spinner)
    duration = time.time() - start
    print(f"✅ Trascrizione completata in {round(duration, 2)} secondi")
    # Salva anche i timestamp delle parole per post-processing
    with open(f"{TRANSCRIPT_FILE}.words.json", "w", encoding="utf-8") as f:
        json.dump(result, f, ensure_ascii=False, indent=2)
    with open(TRANSCRIPT_FILE, "w", encoding="utf-8") as f:
        f.write(result["text"])
    return result["segments"]
 def diarize_audio(audio_path, hf_token, num_speakers=None):
    """Diarizzazione audio con parametri ottimizzati per sovrapposizioni"""
    print("🔹 Riconoscimento speaker (v3.1) con Pyannote...")
    # Carica il modello senza tentare di modificare i parametri
    pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token=hf_token)
    spinner = start_spinner()
    start = time.time()
    # Utilizzo del numero di speaker se specificato
    if num_speakers is not None:
        print(f"ℹ️ Utilizzo {num_speakers} speaker come specificato")
        diarization = pipeline(audio_path, num_speakers=num_speakers)
    else:
        diarization = pipeline(audio_path)
    stop_spinner(spinner)
    duration = time.time() - start
    print(f"✅ Speaker identificati in {round(duration, 2)} secondi")
    # Analizza gli speaker identificati
    speakers = set()
    for segment, _, speaker in diarization.itertracks(yield_label=True):
        speakers.add(speaker)
    print(f"👥 Identificati {len(speakers)} speaker: {', '.join(sorted(speakers))}")
    # Salva la diarizzazione grezza per ispezione
    with open(f"{OUTPUT_FILE}.diarization.json", "w", encoding="utf-8") as f:
        segments = []
        for segment, _, speaker in diarization.itertracks(yield_label=True):
            segments.append({
                "start": segment.start,
                "end": segment.end,
                "speaker": speaker
            })
        json.dump(segments, f, indent=2)
    return diarization
 def format_time(seconds, srt=False):
    """Formatta il tempo in formato leggibile"""
    td = timedelta(seconds=float(seconds))
    hours, remainder = divmod(td.seconds, 3600)
    minutes, seconds = divmod(remainder, 60)
    milliseconds = round(td.microseconds / 1000)
    if srt:
        return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
    else:
        return f"{hours:d}:{minutes:02d}:{seconds:02d}.{milliseconds:03d}"
 def find_overlapping_speech(diarization, threshold=0.5):
    """Identifica segmenti con sovrapposizione di parlato"""
    overlap_segments = []
    speaker_segments = {}
    # Organizza i segmenti per speaker
    for segment, _, speaker in diarization.itertracks(yield_label=True):
        if speaker not in speaker_segments:
            speaker_segments[speaker] = []
        speaker_segments[speaker].append((segment.start, segment.end))
    # Trova sovrapposizioni tra speaker diversi
    speakers = list(speaker_segments.keys())
    for i in range(len(speakers)):
        for j in range(i+1, len(speakers)):
            speaker1 = speakers[i]
            speaker2 = speakers[j]
            for seg1_start, seg1_end in speaker_segments[speaker1]:
                for seg2_start, seg2_end in speaker_segments[speaker2]:
                    # Controlla se i segmenti si sovrappongono
                    if seg1_start < seg2_end and seg2_start < seg1_end:
                        overlap_start = max(seg1_start, seg2_start)
                        overlap_end = min(seg1_end, seg2_end)
                        overlap_duration = overlap_end - overlap_start
                        if overlap_duration >= threshold:
                            overlap_segments.append({
                                "start": overlap_start,
                                "end": overlap_end,
                                "speakers": [speaker1, speaker2],
                                "duration": overlap_duration
                            })
    # Combina sovrapposizioni vicine
    if overlap_segments:
        overlap_segments.sort(key=lambda x: x["start"])
        merged = [overlap_segments[0]]
        for current in overlap_segments[1:]:
            previous = merged[-1]
            if current["start"] - previous["end"] < 0.5:  # Meno di mezzo secondo di distanza
                # Unisci gli intervalli
                previous["end"] = max(previous["end"], current["end"])
                previous["speakers"] = list(set(previous["speakers"] + current["speakers"]))
                previous["duration"] = previous["end"] - previous["start"]
            else:
                merged.append(current)
        overlap_segments = merged
    return overlap_segments
 def match_transcript_to_speakers(transcript_segments, diarization, min_segment_length=1.0, max_chars=150):
    """Abbina la trascrizione agli speaker con gestione migliorata delle sovrapposizioni"""
    print("🔹 Combinazione transcript + speaker...")
    # Trova le potenziali sovrapposizioni
    overlaps = find_overlapping_speech(diarization)
    if overlaps:
        print(f"ℹ️ Rilevate {len(overlaps)} potenziali sovrapposizioni di parlato")
        with open(f"{OUTPUT_FILE}.overlaps.json", "w", encoding="utf-8") as f:
            json.dump(overlaps, f, indent=2)
    # Combina segmenti brevi dello stesso speaker
    combined_segments = []
    current_segment = None
    for segment, _, speaker in diarization.itertracks(yield_label=True):
        start_time = round(segment.start, 2)
        end_time = round(segment.end, 2)
        # Skip se il segmento è troppo breve
        if end_time - start_time < 0.2:
            continue
        # Se è il primo segmento o c'è un cambio di speaker
        if current_segment is None or current_segment["speaker"] != speaker:
            if current_segment is not None:
                combined_segments.append(current_segment)
            current_segment = {
                "start": start_time,
                "end": end_time,
                "speaker": speaker
            }
        else:
            # Estendi il segmento corrente
            current_segment["end"] = end_time
    # Aggiungi l'ultimo segmento
    if current_segment is not None:
        combined_segments.append(current_segment)
    # Ora abbina il testo ai segmenti combinati
    output_segments = []
    counter = 1
    for segment in combined_segments:
        start_time = segment["start"]
        end_time = segment["end"]
        speaker = segment["speaker"]
        # Skip segmenti troppo brevi dopo la combinazione
        if end_time - start_time < min_segment_length:
            continue
        # Trova il testo che corrisponde a questo intervallo di tempo
        text = ""
        for s in transcript_segments:
            # Se il segmento di testo si sovrappone al segmento speaker
            if (s["start"] < end_time and s["end"] > start_time):
                text += s["text"] + " "
        text = text.strip()
        if text:
            # Controlla se questo segmento è in una sovrapposizione
            is_overlap = False
            overlap_speakers = []
            for overlap in overlaps:
                if (start_time < overlap["end"] and end_time > overlap["start"]):
                    is_overlap = True
                    overlap_speakers = overlap["speakers"]
                    break
            # Formatta l'output in base alla presenza di sovrapposizione
            if is_overlap and speaker in overlap_speakers:
                speaker_text = f"[{speaker}+] " if len(overlap_speakers) > 1 else f"[{speaker}] "
            else:
                speaker_text = f"[{speaker}] "
            # Crea il segmento completo con testo
            output_segment = {
                "start": start_time,
                "end": end_time,
                "speaker": speaker,
                "speaker_text": speaker_text,
                "text": text
            }
            output_segments.append(output_segment)
    # Ora formatta e salva l'output finale
    output = []
    srt = []
    vtt = ["WEBVTT\n"]
    for i, segment in enumerate(output_segments, 1):
        start_time = segment["start"]
        end_time = segment["end"]
        speaker_text = segment["speaker_text"]
        text = segment["text"]
        formatted_text = f"{speaker_text}({format_time(start_time)} - {format_time(end_time)}): {text}"
        srt_text = f"{counter}\n{format_time(start_time, True)} --> {format_time(end_time, True)}\n{speaker_text}{text}"
        vtt_text = f"{format_time(start_time)} --> {format_time(end_time)}\n{speaker_text}{text}"
        output.append(formatted_text)
        srt.append(srt_text)
        vtt.append(vtt_text)
        counter += 1
    with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
        f.write("\n".join(output))
    with open(SRT_FILE, "w", encoding="utf-8") as f:
        f.write("\n\n".join(srt))
    with open(VTT_FILE, "w", encoding="utf-8") as f:
        f.write("\n\n".join(vtt))
    print("✅ Output finale salvato:", OUTPUT_FILE, SRT_FILE, VTT_FILE)
 def parse_timestamp(time_str):
    """Convert a timestamp string to seconds"""
    # Handle both SRT (00:00:00,000) and standard format (00:00:00.000)
    time_str = time_str.replace(',', '.')
    hours, minutes, seconds = time_str.split(':')
    hours = int(hours)
    minutes = int(minutes)
    seconds = float(seconds)
    total_seconds = hours * 3600 + minutes * 60 + seconds
    return total_seconds
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Trascrizione + Speaker Diarization avanzata")
    parser.add_argument("--audio", help="File audio WAV", required=True)
    parser.add_argument("--token", help="Token Hugging Face per Pyannote")
    parser.add_argument("--model", default="large", help="Modello Whisper (tiny, base, medium, large)")
    parser.add_argument("--no-diarization", action="store_true", help="Disabilita il riconoscimento speaker")
    parser.add_argument("--output-prefix", default="transcript", help="Prefisso per i file di output")
    parser.add_argument("--num-speakers", type=int, default=None, help="Numero di speaker se conosciuto in anticipo")
    parser.add_argument("--fix-start", action="store_true", help="Aggiungi silenzio all'inizio per catturare meglio i primi secondi")
    parser.add_argument("--min-segment", type=float, default=1.0, help="Lunghezza minima dei segmenti in secondi")
    args = parser.parse_args()
    # Use Hugging Face token from environment variable if not provided via argument
    hf_token = args.token or os.getenv("HF_TOKEN")
    if not hf_token:
        raise ValueError("Token Hugging Face non fornito. Specificarlo con --token o nella variabile HF_TOKEN nel file .env")
    # Use model from environment variable if available
    model_size = os.getenv("WHISPER_MODEL", args.model)
    # Use number of speakers from environment variable if available and not provided via argument
    num_speakers = args.num_speakers
    if num_speakers is None and os.getenv("NUM_SPEAKERS"):
        try:
            num_speakers = int(os.getenv("NUM_SPEAKERS"))
        except ValueError:
            pass
    # Use fix-start from environment variable if available and not provided via argument
    fix_start = args.fix_start
    if not fix_start and os.getenv("FIX_START", "").lower() == "true":
        fix_start = True
    # Definizione dei nomi dei file di output
    output_prefix = os.getenv("OUTPUT_PREFIX", args.output_prefix)
    TRANSCRIPT_FILE = f"{output_prefix}.txt"
    OUTPUT_FILE = f"{output_prefix}_final.txt"
    SRT_FILE = f"{output_prefix}.srt"
    VTT_FILE = f"{output_prefix}.vtt"
    if not os.path.exists(args.audio):
        raise ValueError(f"File audio {args.audio} non trovato")
    # Aggiungi silenzio all'inizio se richiesto
    input_audio = args.audio
    if fix_start:
        extended_audio = "extended_" + os.path.basename(args.audio)
        input_audio = extend_audio_beginning(args.audio, extended_audio)
    # Trascrivi l'audio
    segments = transcribe_audio(input_audio, model_size)
    # Esegui diarizzazione e abbina trascrizione a speaker
    if not args.no_diarization:
        diarization = diarize_audio(input_audio, hf_token, num_speakers)
        match_transcript_to_speakers(segments, diarization, args.min_segment)
    else:
        print("🛑 Diarization disabilitata. Salvo solo la trascrizione.")
        with open(OUTPUT_FILE, "w", encoding="utf-8") as f_out, open(SRT_FILE, "w", encoding="utf-8") as f_srt, open(VTT_FILE, "w", encoding="utf-8") as f_vtt:
            f_vtt.write("WEBVTT\n\n")
            for i, s in enumerate(segments, 1):
                start = format_time(s['start'])
                end = format_time(s['end'])
                f_out.write(f"({start} - {end}): {s['text'].strip()}\n")
                f_srt.write(f"{i}\n{format_time(s['start'], True)} --> {format_time(s['end'], True)}\n{s['text'].strip()}\n\n")
                f_vtt.write(f"{start} --> {end}\n{s['text'].strip()}\n\n")
        print(f"✅ Output salvato senza diarizzazione: {OUTPUT_FILE}, {SRT_FILE}, {VTT_FILE}")
    # Rimuovi file audio esteso se creato
    if args.fix_start and os.path.exists(extended_audio):
        os.remove(extended_audio)
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,10 @@
 whisper
 torch>=1.13.1
 numpy>=1.20.0
 pyannote.audio>=3.0.0
 python-dotenv>=0.19.0
 pydub>=0.25.1
 tqdm>=4.64.0
 matplotlib>=3.5.0
 scikit-learn>=1.0.0
 soundfile>=0.10.3
--- a/setup.sh
+++ b/setup.sh
@ -0,0 +1,113 @@
 #!/bin/bash
 set -e  # Exit on error
 echo "🚀 Setting up Transcription Runner..."
 # Check if Python is installed
 if ! command -v python3 &> /dev/null; then
    echo "❌ Python 3 not found. Please install Python 3 before proceeding."
    exit 1
 fi
 # Check if pip is installed
 if ! command -v pip3 &> /dev/null; then
    echo "❌ pip3 not found. Please install pip3 before proceeding."
    exit 1
 fi
 # Check if AWS CLI is installed
 if ! command -v aws &> /dev/null; then
    echo "⚠️ AWS CLI not found. Installing..."
    if [[ "$OSTYPE" == "darwin"* ]]; then
        # macOS
        if command -v brew &> /dev/null; then
            brew install awscli
        else
            echo "❌ Homebrew not found. Please install AWS CLI manually."
            exit 1
        fi
    elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
        # Linux
        if command -v apt-get &> /dev/null; then
            sudo apt-get update
            sudo apt-get install -y awscli
        elif command -v yum &> /dev/null; then
            sudo yum install -y awscli
        else
            echo "❌ Unable to detect package manager. Please install AWS CLI manually."
            exit 1
        fi
    else
        echo "❌ Unsupported OS. Please install AWS CLI manually."
        exit 1
    fi
 fi
 # Check if netcat is installed
 if ! command -v nc &> /dev/null; then
    echo "⚠️ netcat not found. Installing..."
    if [[ "$OSTYPE" == "darwin"* ]]; then
        # macOS
        if command -v brew &> /dev/null; then
            brew install netcat
        else
            echo "❌ Homebrew not found. Please install netcat manually."
            exit 1
        fi
    elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
        # Linux
        if command -v apt-get &> /dev/null; then
            sudo apt-get update
            sudo apt-get install -y netcat
        elif command -v yum &> /dev/null; then
            sudo yum install -y netcat
        else
            echo "❌ Unable to detect package manager. Please install netcat manually."
            exit 1
        fi
    else
        echo "❌ Unsupported OS. Please install netcat manually."
        exit 1
    fi
 fi
 # Create virtual environment
 echo "🔨 Creating Python virtual environment..."
 python3 -m venv venv
 source venv/bin/activate
 # Install dependencies
 echo "📦 Installing dependencies..."
 pip install -r requirements.txt
 # Create .env file if it doesn't exist
 if [ ! -f .env ]; then
    echo "📝 Creating .env file from template..."
    cp .env.sample .env
    echo "ℹ️ Please edit the .env file with your configuration before running the scripts."
 fi
 # Make shell scripts executable
 echo "🔑 Making scripts executable..."
 chmod +x whisper_parallel.sh
 # Set up AWS credentials if needed
 if ! aws configure list &> /dev/null; then
    echo "⚠️ AWS credentials not configured. Setting up..."
    echo "Please enter your AWS credentials:"
    aws configure
 fi
 # Check if AWS key pair exists
 KEY_NAME=$(grep KEY_NAME .env | cut -d '=' -f2 || echo "whisper-key")
 if ! aws ec2 describe-key-pairs --key-names "$KEY_NAME" &> /dev/null; then
    echo "🔑 Creating EC2 key pair..."
    mkdir -p ~/.ssh
    aws ec2 create-key-pair --key-name "$KEY_NAME" --query 'KeyMaterial' --output text > ~/.ssh/"$KEY_NAME".pem
    chmod 400 ~/.ssh/"$KEY_NAME".pem
    echo "✅ Key pair created: ~/.ssh/$KEY_NAME.pem"
 fi
 echo "✅ Setup complete! You can now run ./whisper_parallel.sh"
 echo "ℹ️ Remember to edit the .env file with your configuration."
--- a/test_fix.py
+++ b/test_fix.py
@ -0,0 +1,78 @@
 import json
 def split_long_segments(segments, max_chars=150):
    """Split segments that are too long into smaller chunks."""
    import re
    new_segments = []
    for segment in segments:
        if "text" in segment and len(segment["text"]) > max_chars:
            # Split text at sentence boundaries or by character count
            sentences = re.split(r'(?<=[.!?]) +', segment["text"])
            current_text = ""
            start_time = segment["start"]
            for sentence in sentences:
                if len(current_text) + len(sentence) > max_chars and current_text:
                    # Calculate proportional time based on text length
                    portion = len(current_text) / len(segment["text"])
                    mid_time = segment["start"] + portion * (segment["end"] - segment["start"])
                    new_segments.append({
                        "start": start_time,
                        "end": mid_time,
                        "text": current_text.strip(),
                        "speaker": segment.get("speaker", ""),
                        "speaker_text": segment.get("speaker_text", f"[{segment.get('speaker', '')}] ")  # Fixed line
                    })
                    start_time = mid_time
                    current_text = sentence
                else:
                    current_text += " " + sentence if current_text else sentence
            # Add the last part
            if current_text:
                new_segments.append({
                    "start": start_time,
                    "end": segment["end"],
                    "text": current_text.strip(),
                    "speaker": segment.get("speaker", ""),
                    "speaker_text": segment.get("speaker_text", f"[{segment.get('speaker', '')}] ")  # Fixed line
                })
        else:
            new_segments.append(segment)
    return new_segments
 # Create mock test data
 test_segments = [
    {
        "start": 0.0,
        "end": 10.0,
        "speaker": "SPEAKER_00",
        "speaker_text": "[SPEAKER_00] ",
        "text": "This is a very long text that exceeds the maximum character limit. It should be split into multiple segments. This is another sentence to make sure we have enough text to split. And one more sentence to be really sure."
    }
 ]
 # Run the split_long_segments function
 print("Testing split_long_segments...")
 split_segments = split_long_segments(test_segments, max_chars=50)
 print(f"Number of segments after splitting: {len(split_segments)}")
 # Verify that all segments have the speaker_text field
 all_have_speaker_text = all("speaker_text" in segment for segment in split_segments)
 print(f"All segments have speaker_text field: {all_have_speaker_text}")
 # Dump the result to inspect
 print("\nSplit segments:")
 print(json.dumps(split_segments, indent=2))
 # Check if we can access speaker_text without error
 try:
    for segment in split_segments:
        speaker_text = segment["speaker_text"]
    print("\n✅ Successfully accessed speaker_text on all segments")
 except KeyError as e:
    print(f"\n❌ KeyError when accessing: {e}")
--- a/whisper_parallel.sh
+++ b/whisper_parallel.sh
@ -0,0 +1,458 @@
 #!/bin/bash
 # Load environment variables from .env file
 if [ -f .env ]; then
  echo "Loading environment variables from .env file..."
  set -o allexport
  source .env
  set +o allexport
 else
  echo "Warning: .env file not found. Using default values."
 fi
 # === CONFIGURAZIONE ===
 # These defaults will be used if not set in .env file
 KEY_NAME=${KEY_NAME:-"whisper-key"}
 KEY_FILE=${KEY_FILE:-"$HOME/.ssh/${KEY_NAME}.pem"}
 SECURITY_GROUP=${SECURITY_GROUP:-"whisper-sg"}
 INSTANCE_TYPE=${INSTANCE_TYPE:-"g4dn.12xlarge"}  # Default a 1 GPU per rispettare limiti vCPU
 REGION=${REGION:-"eu-south-1"}
 AMI_ID=${AMI_ID:-"ami-059603706d3734615"}
 VIDEO_FILE=${VIDEO_FILE:-"mio_video.mp4"}
 ORIGINAL_FILENAME=$(basename "$VIDEO_FILE" | cut -d. -f1)
 START_MIN=${START_MIN:-0}      # Default value if not set
 END_MIN=${END_MIN:-0}        # Default value if not set
 SHIFT_SECONDS=${SHIFT_SECONDS:-0}  # Shift timestamps by this many seconds
 SHIFT_ONLY=${SHIFT_ONLY:-false}   # Set to true to only perform shifting on existing files
 INPUT_PREFIX=${INPUT_PREFIX:-""}   # Prefix for input files when using SHIFT_ONLY
 GPU_COUNT=${GPU_COUNT:-1}      # Numero di GPU da utilizzare (default: 1)
 NUM_SPEAKERS=${NUM_SPEAKERS:-""}  # Numero di speaker se conosciuto (opzionale)
 FIX_START=${FIX_START:-"true"}  # Aggiunge silenzio all'inizio per catturare i primi secondi
 # === FUNZIONE PER SHIFT DEI TIMESTAMPS ===
 shift_timestamps() {
    local input_file=$1
    local output_file=$2
    local shift_by=$3
    local file_ext="${input_file##*.}"
    if [ "$file_ext" = "srt" ]; then
        echo "🕒 Shifting SRT timestamps by $shift_by seconds..."
        # SRT format: 00:00:05,440 --> 00:00:08,300
        awk -v shift=$shift_by '
        function time_to_seconds(time_str) {
            split(time_str, parts, ",")
            split(parts[1], time_parts, ":")
            return time_parts[1]*3600 + time_parts[2]*60 + time_parts[3] + parts[2]/1000
        }
        function seconds_to_time(seconds) {
            h = int(seconds/3600)
            m = int((seconds-h*3600)/60)
            s = int(seconds-h*3600-m*60)
            ms = int((seconds - int(seconds))*1000)
            return sprintf("%02d:%02d:%02d,%03d", h, m, s, ms)
        }
        {
            if (match($0, /^([0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}) --> ([0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})$/)) {
                start_time = time_to_seconds(substr($0, RSTART, RLENGTH/2-5))
                end_time = time_to_seconds(substr($0, RSTART+RLENGTH/2+5, RLENGTH/2-5))
                new_start = start_time + shift
                new_end = end_time + shift
                # Handle negative times (not allowed in SRT)
                if (new_start < 0) new_start = 0
                if (new_end < 0) new_end = 0
                print seconds_to_time(new_start)" --> "seconds_to_time(new_end)
            } else {
                print $0
            }
        }' "$input_file" > "$output_file"
    elif [ "$file_ext" = "vtt" ]; then
        echo "🕒 Shifting VTT timestamps by $shift_by seconds..."
        # VTT format: 00:00:05.440 --> 00:00:08.300
        awk -v shift=$shift_by '
        function time_to_seconds(time_str) {
            split(time_str, parts, ".")
            split(parts[1], time_parts, ":")
            return time_parts[1]*3600 + time_parts[2]*60 + time_parts[3] + parts[2]/1000
        }
        function seconds_to_time(seconds) {
            h = int(seconds/3600)
            m = int((seconds-h*3600)/60)
            s = int(seconds-h*3600-m*60)
            ms = int((seconds - int(seconds))*1000)
            return sprintf("%02d:%02d:%02d.%03d", h, m, s, ms)
        }
        {
            if (match($0, /^([0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}) --> ([0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3})$/)) {
                start_time = time_to_seconds(substr($0, RSTART, RLENGTH/2-5))
                end_time = time_to_seconds(substr($0, RSTART+RLENGTH/2+5, RLENGTH/2-5))
                new_start = start_time + shift
                new_end = end_time + shift
                # Handle negative times
                if (new_start < 0) new_start = 0
                if (new_end < 0) new_end = 0
                print seconds_to_time(new_start)" --> "seconds_to_time(new_end)
            } else {
                print $0
            }
        }' "$input_file" > "$output_file"
    elif [ "$file_ext" = "txt" ]; then
        echo "🕒 Shifting timestamps in TXT by $shift_by seconds..."
        # For text files, we need to handle timestamps in formats like [00:05.440]
        awk -v shift=$shift_by '
        function time_to_seconds(time_str) {
            # Remove brackets
            gsub(/[\[\]]/, "", time_str)
            # Check format - either MM:SS.mmm or HH:MM:SS.mmm
            if (split(time_str, parts, ":") == 2) {
                # MM:SS.mmm format
                mm = parts[1]
                split(parts[2], sec_parts, ".")
                ss = sec_parts[1]
                ms = sec_parts[2] ? sec_parts[2] : 0
                return mm*60 + ss + ms/1000
            } else {
                # HH:MM:SS.mmm format
                hh = parts[1]
                mm = parts[2]
                split(parts[3], sec_parts, ".")
                ss = sec_parts[1]
                ms = sec_parts[2] ? sec_parts[2] : 0
                return hh*3600 + mm*60 + ss + ms/1000
            }
        }
        function seconds_to_time(seconds) {
            h = int(seconds/3600)
            m = int((seconds-h*3600)/60)
            s = seconds-h*3600-m*60
            # Format with up to 3 decimal places for milliseconds
            if (h > 0) {
                return sprintf("[%02d:%02d:%05.3f]", h, m, s)
            } else {
                return sprintf("[%02d:%05.3f]", m, s)
            }
        }
        {
            line = $0
            # Match timestamps in the format [MM:SS.mmm] or [HH:MM:SS.mmm]
            while (match(line, /\[[0-9]+:[0-9]+(\.[0-9]+)?\]/) || match(line, /\[[0-9]+:[0-9]+:[0-9]+(\.[0-9]+)?\]/)) {
                time_str = substr(line, RSTART, RLENGTH)
                time_sec = time_to_seconds(time_str)
                new_time = time_sec + shift
                if (new_time < 0) new_time = 0
                new_time_str = seconds_to_time(new_time)
                # Replace the timestamp
                line = substr(line, 1, RSTART-1) new_time_str substr(line, RSTART+RLENGTH)
            }
            print line
        }' "$input_file" > "$output_file"
    else
        echo "⚠️ Unsupported file extension for shifting: $file_ext"
        cp "$input_file" "$output_file"
    fi
 }
 # If we're only shifting timestamps, do that and exit
 if [ "$SHIFT_ONLY" = "true" ]; then
    if [ -z "$INPUT_PREFIX" ]; then
        echo "❌ ERROR: When using SHIFT_ONLY=true, you must specify INPUT_PREFIX"
        exit 1
    fi
    echo "🕒 Performing timestamp shifting by $SHIFT_SECONDS seconds..."
    # Process each file type
    for ext in txt srt vtt; do
        # Check for regular transcript
        if [ -f "${INPUT_PREFIX}.${ext}" ]; then
            shift_timestamps "${INPUT_PREFIX}.${ext}" "${INPUT_PREFIX}_shifted.${ext}" $SHIFT_SECONDS
            echo "✅ Created ${INPUT_PREFIX}_shifted.${ext}"
        fi
        # Check for final transcript
        if [ -f "${INPUT_PREFIX}_final.${ext}" ]; then
            shift_timestamps "${INPUT_PREFIX}_final.${ext}" "${INPUT_PREFIX}_final_shifted.${ext}" $SHIFT_SECONDS
            echo "✅ Created ${INPUT_PREFIX}_final_shifted.${ext}"
        fi
    done
    echo "✅ Timestamp shifting complete!"
    exit 0
 fi
 # Generate random suffix
 if command -v openssl > /dev/null 2>&1; then
    RANDOM_SUFFIX=$(openssl rand -hex 4)
 elif command -v md5sum > /dev/null 2>&1; then
    RANDOM_SUFFIX=$(date +%s | md5sum | head -c 8)
 elif command -v shasum > /dev/null 2>&1; then
    RANDOM_SUFFIX=$(date +%s | shasum | head -c 8)
 else
    RANDOM_SUFFIX=$RANDOM$RANDOM
 fi
 AUDIO_FILE="${ORIGINAL_FILENAME}_${START_MIN}_${END_MIN}_${RANDOM_SUFFIX}.wav"
 DIARIZATION_ENABLED=${DIARIZATION_ENABLED:-true}
 HF_TOKEN=${HF_TOKEN:-""}
 BUCKET_NAME=${BUCKET_NAME:-"whisper-video-transcripts"}
 # Output file names with the same format
 TRANSCRIPT_PREFIX="${ORIGINAL_FILENAME}_${START_MIN}_${END_MIN}_${RANDOM_SUFFIX}"
 TRANSCRIPT_FILE="${TRANSCRIPT_PREFIX}.txt"
 FINAL_TRANSCRIPT_FILE="${TRANSCRIPT_PREFIX}_final.txt"
 SRT_FILE="${TRANSCRIPT_PREFIX}.srt"
 VTT_FILE="${TRANSCRIPT_PREFIX}.vtt"
 # === CONTROLLI PRELIMINARI ===
 if [ ! -f "$KEY_FILE" ]; then
  echo "❌ Chiave SSH non trovata in $KEY_FILE"
  exit 1
 fi
 if [ ! -f "parallel_transcript.py" ]; then
  echo "❌ File parallel_transcript.py non trovato"
  exit 1
 fi
 if [ ! -f "$VIDEO_FILE" ]; then
  echo "❌ File video $VIDEO_FILE non trovato"
  exit 1
 fi
 # === CONVERTI MP4 IN WAV E APPLICA CROP PRIMA DELL'UPLOAD ===
 echo "🎙️ Converto $VIDEO_FILE in $AUDIO_FILE con crop applicato..."
 FFMPEG_CMD="ffmpeg -i \"$VIDEO_FILE\""
 # Aggiungi parametri di crop se START_MIN o END_MIN sono impostati
 if [ "$START_MIN" != "0" ] || [ "$END_MIN" != "0" ]; then
  START_SEC=$((START_MIN * 60))
  if [ "$END_MIN" != "0" ]; then
    END_SEC=$((END_MIN * 60))
    FFMPEG_CMD+=" -ss $START_SEC -to $END_SEC"
  else
    FFMPEG_CMD+=" -ss $START_SEC"
  fi
  echo "⏱️ Crop video da $START_MIN min a ${END_MIN:-fine} min"
 fi
 # Completa il comando ffmpeg con gli altri parametri necessari
 FFMPEG_CMD+=" -ac 1 -ar 16000 -vn \"$AUDIO_FILE\" -y"
 # Esegui il comando ffmpeg
 eval $FFMPEG_CMD
 echo "☁️ Controllo se l'audio è già presente su S3..."
 AUDIO_UPLOADED=""
 if ! aws s3 ls s3://$BUCKET_NAME/$AUDIO_FILE >/dev/null 2>&1; then
  echo "⬆️ Carico $AUDIO_FILE su S3..."
  aws s3 cp $AUDIO_FILE s3://$BUCKET_NAME/
  AUDIO_UPLOADED="true"
 else
  echo "✅ Audio già presente su S3. Salto upload."
 fi
 # === CONTROLLA O CREA LA DEFAULT VPC ===
 echo "🔍 Controllo default VPC nella regione $REGION..."
 DEFAULT_VPC_ID=$(aws ec2 describe-vpcs --region $REGION --filters Name=isDefault,Values=true --query "Vpcs[0].VpcId" --output text)
 if [ "$DEFAULT_VPC_ID" = "None" ]; then
  echo "➕ Nessuna default VPC trovata. La creo..."
  DEFAULT_VPC_ID=$(aws ec2 create-default-vpc --region $REGION --query "Vpc.VpcId" --output text)
  echo "✅ Default VPC creata: $DEFAULT_VPC_ID"
 else
  echo "✅ Default VPC esistente: $DEFAULT_VPC_ID"
 fi
 # === CREA SECURITY GROUP SE NECESSARIO ===
 aws ec2 describe-security-groups --group-names $SECURITY_GROUP --region $REGION &>/dev/null
 if [ $? -ne 0 ]; then
  echo "➕ Creo security group $SECURITY_GROUP..."
  aws ec2 create-security-group --group-name $SECURITY_GROUP --description "Whisper SG" --vpc-id $DEFAULT_VPC_ID --region $REGION
  aws ec2 authorize-security-group-ingress --group-name $SECURITY_GROUP --protocol tcp --port 22 --cidr 0.0.0.0/0 --region $REGION
 fi
 # === AVVIA L'ISTANZA EC2 ===
 echo "🚀 Avvio istanza EC2 GPU ($INSTANCE_TYPE con GPU)..."
 INSTANCE_ID=$(aws ec2 run-instances \
  --image-id $AMI_ID \
  --instance-type $INSTANCE_TYPE \
  --key-name $KEY_NAME \
  --security-groups $SECURITY_GROUP \
  --iam-instance-profile Name=WhisperS3Profile \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":50}}]' \
  --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=whisper-runner}]" \
  --region $REGION \
  --query "Instances[0].InstanceId" \
  --output text)
 if [ -z "$INSTANCE_ID" ]; then
  echo "❌ ERRORE: ID istanza non ottenuto. Verifica che l'AMI sia corretta per la regione $REGION."
  exit 1
 fi
 echo "🆔 Istanza avviata: $INSTANCE_ID"
 # === FUNZIONE DI CLEANUP IN CASO DI USCITA IMPROVVISA ===
 function cleanup {
  echo "🧨 Cleanup in corso..."
  # Rimuove il file audio locale se esiste
  if [ -f "$AUDIO_FILE" ]; then
    echo "🧹 Rimuovo file audio locale $AUDIO_FILE..."
    rm -f "$AUDIO_FILE"
    echo "✅ File audio locale rimosso."
  fi
  # Rimuove l'audio da S3 se è stato caricato in questo script
  if [ "$AUDIO_UPLOADED" = "true" ]; then
    echo "🧹 Rimuovo $AUDIO_FILE da S3..."
    aws s3 rm s3://$BUCKET_NAME/$AUDIO_FILE
    echo "✅ File rimosso da S3."
  fi
  # Termina l'istanza EC2 se è stata avviata
  if [ -n "$INSTANCE_ID" ]; then
    echo "🧹 Termino l'istanza EC2 ($INSTANCE_ID)..."
    aws ec2 terminate-instances --instance-ids $INSTANCE_ID --region $REGION >/dev/null
    # Aspetta la terminazione con timeout
    echo "⏳ Aspetto la terminazione dell'istanza (max 60 secondi)..."
    WAIT_TIMEOUT=60
    WAIT_START=$(date +%s)
    WAITING=true
    while [ "$WAITING" = true ]; do
      # Controlla lo stato dell'istanza
      STATUS=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --region $REGION --query "Reservations[0].Instances[0].State.Name" --output text 2>/dev/null)
      # Se lo stato è terminated o l'istanza non esiste più, esci dal ciclo
      if [ "$STATUS" = "terminated" ] || [ "$STATUS" = "None" ]; then
        echo "✅ Istanza terminata con successo."
        WAITING=false
      else
        # Controlla se è scaduto il timeout
        WAIT_ELAPSED=$(($(date +%s) - WAIT_START))
        if [ $WAIT_ELAPSED -ge $WAIT_TIMEOUT ]; then
          echo "⚠️ Timeout durante l'attesa della terminazione. L'istanza potrebbe essere ancora in fase di terminazione."
          WAITING=false
        else
          # Aspetta un secondo prima di controllare di nuovo
          sleep 2
          echo -n "."
        fi
      fi
    done
  fi
 }
 # Esegui cleanup su qualsiasi uscita: normale, errore, o Ctrl+C
 trap cleanup EXIT
 echo "⏳ Attendo che sia pronta..."
 aws ec2 wait instance-running --instance-ids $INSTANCE_ID --region $REGION
 echo "🔐 Aspetto che l'istanza sia pronta per SSH..."
 for i in {1..35}; do
  PUBLIC_IP=$(aws ec2 describe-instances --instance-id $INSTANCE_ID --region $REGION --query "Reservations[0].Instances[0].PublicIpAddress" --output text)
  echo "🌍 IP pubblico: $PUBLIC_IP"
  nc -zv $PUBLIC_IP 22 >/dev/null 2>&1
  if [ $? -eq 0 ]; then
    echo "✅ Porta 22 aperta, l'istanza è pronta!"
    break
  else
    echo "⏳ Tentativo $i/35: porta 22 ancora chiusa. Riprovo tra 5s..."
    sleep 5
  fi
 done
 # === CARICA SCRIPT PYTHON SULL'ISTANZA ===
 echo "📦 Carico script sulla macchina EC2..."
 scp -o StrictHostKeyChecking=no -i $KEY_FILE parallel_transcript.py ubuntu@$PUBLIC_IP:/home/ubuntu/
 scp -o StrictHostKeyChecking=no -i $KEY_FILE .env ubuntu@$PUBLIC_IP:/home/ubuntu/
 scp -o StrictHostKeyChecking=no -i $KEY_FILE requirements.txt ubuntu@$PUBLIC_IP:/home/ubuntu/
 echo "⚙️ Scarico audio da S3 ed eseguo trascrizione avanzata..."
 ssh -t -i $KEY_FILE -o "SendEnv=TERM" ubuntu@$PUBLIC_IP "
  # Prevent broken pipe errors
  export PYTHONUNBUFFERED=1
  set -e
  cd /home/ubuntu
  echo '⬇️ Download da S3...'
  aws s3 cp s3://$BUCKET_NAME/$AUDIO_FILE /home/ubuntu/$AUDIO_FILE --region $REGION
  echo '📦 File scaricato:'
  ls -lh $AUDIO_FILE
  echo '⚙️ Attivo ambiente virtuale...'
  source whisper-env/bin/activate
  # Installa PyDub se non presente
  if ! pip list | grep -q pydub; then
    echo '📦 Installo dipendenze mancanti...'
    pip install pydub
  fi
  # Installa le dipendenze da requirements.txt
  pip install -r requirements.txt
  echo '🖥️ Informazioni GPU:'
  nvidia-smi
  echo 'Audio file: $AUDIO_FILE'
  echo 'Token Hugging Face: $HF_TOKEN'
  echo 'Diarization enabled: $DIARIZATION_ENABLED'
  echo 'Numero di speaker: $NUM_SPEAKERS'
  echo '✍️ Lancio trascrizione avanzata...'
  CMD=\"python3 parallel_transcript.py --audio $AUDIO_FILE --token $HF_TOKEN \
      --output-prefix $TRANSCRIPT_PREFIX\"
  if [ \"$DIARIZATION_ENABLED\" = false ]; then
    CMD+=\" --no-diarization\"
  fi
  if [ -n \"$NUM_SPEAKERS\" ]; then
    CMD+=\" --num-speakers $NUM_SPEAKERS\"
    echo '👥 Utilizzo numero di speaker specificato: $NUM_SPEAKERS'
  fi
  if [ \"$FIX_START\" = true ]; then
    CMD+=\" --fix-start\"
    echo '⏱️ Aggiunta correzione per i primi secondi'
  fi
  eval \$CMD
 "
 # === SCARICA I FILE ===
 echo "⬇️ Scarico i file di output..."
 scp -i $KEY_FILE ubuntu@$PUBLIC_IP:/home/ubuntu/${TRANSCRIPT_PREFIX}_final.txt . || echo "⚠️ Impossibile scaricare _final.txt (potrebbe non essere stato generato)"
 scp -i $KEY_FILE ubuntu@$PUBLIC_IP:/home/ubuntu/${TRANSCRIPT_PREFIX}.txt . || echo "⚠️ Impossibile scaricare .txt"
 scp -i $KEY_FILE ubuntu@$PUBLIC_IP:/home/ubuntu/${TRANSCRIPT_PREFIX}.srt . || echo "⚠️ Impossibile scaricare .srt"
 scp -i $KEY_FILE ubuntu@$PUBLIC_IP:/home/ubuntu/${TRANSCRIPT_PREFIX}.vtt . || echo "⚠️ Impossibile scaricare .vtt"
 # Scarica anche i file JSON con dati aggiuntivi per debugging
 scp -i $KEY_FILE ubuntu@$PUBLIC_IP:/home/ubuntu/${TRANSCRIPT_PREFIX}.txt.words.json . 2>/dev/null || true
 scp -i $KEY_FILE ubuntu@$PUBLIC_IP:/home/ubuntu/${TRANSCRIPT_PREFIX}_final.txt.diarization.json . 2>/dev/null || true
 scp -i $KEY_FILE ubuntu@$PUBLIC_IP:/home/ubuntu/${TRANSCRIPT_PREFIX}_final.txt.overlaps.json . 2>/dev/null || true
 echo "📄 File scaricati:"
 ls -lh ${TRANSCRIPT_PREFIX}* 2>/dev/null || echo "⚠️ Nessun file trovato con il prefisso specificato"