[Résolu] Décodage des chaines encodées (mail)

JujuLand · Le 14/11/2016, à 15:35

Bonjour, mon but est de décoder correctement les chaines encodées de mails dans les fichiers mbox.

Après avoir empiriquement fait le boulôt, j'ai voulu optimiser ce traitement qui était un peu bordelique. J'ai donc corrigé mon outil, et je tombe sur un os que je n'arrive pas à ronger.

Exécutez le code suivant (du python), et dites moi où je me plante. Il y a 4 chaines différentes, les trois premières sont parfaitement décodées, la 4eme non ..., pourtant Thunderbird la décode parfaitement.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import mailbox
import base64
import os
import sys
import email
import subprocess
import string
from string import upper
import re

###################################################################################
def decode(texte):
###################################################################################

	if re.search("\?UTF-8\?",texte.upper()) is not None :
		code="utf-8"
	else:
		code="iso-8859"
	ll = email.header.decode_header(texte)
	texte0=""
       	keep=0
	for l in ll:
		print l
		texte0=re.sub("\n","",l[0])
		break
	if code == "utf-8":
		texte=decode_utf8(texte0)
	else :
		texte=decode_ansi(texte0)

	return texte

###################################################################################
def decode_ansi(texte):
###################################################################################
	texte=re.sub("\x80","",texte)
	texte=re.sub("\x85","",texte)
	texte=re.sub("\x94",'"',texte)
	texte=re.sub("\x93",'"',texte)
	texte=re.sub("\x9c",'oe',texte)
	texte=re.sub("\x99",'',texte)
	texte=re.sub("\xa0","",texte)
	texte=re.sub("\xa6","",texte)
	texte=re.sub("\xb0","'",texte)
	texte=re.sub("\xb9","'",texte)
	texte=re.sub("\xa0","",texte)
	texte=re.sub("\xab","'",texte)
	texte=re.sub("\xb4","",texte)
	texte=re.sub("\xb8","",texte)
	texte=re.sub("\xbb","'",texte)
	texte=re.sub("\xbf","",texte)

	texte=re.sub("\x80","A",texte)
	texte=re.sub("\xc0","A",texte)
	texte=re.sub("\xc1","A",texte)
	texte=re.sub("\xc2","A",texte)
	texte=re.sub("\xc3","A",texte)
	texte=re.sub("\xc4","A",texte)
	texte=re.sub("\xc5","A",texte)
	texte=re.sub("\xc7","C",texte)
	texte=re.sub("\xc8","E",texte)
	texte=re.sub("\xc9","E",texte)
	texte=re.sub("\xca","E",texte)
	texte=re.sub("\xcb","E",texte)
	texte=re.sub("\xcc","I",texte)
	texte=re.sub("\xcd","I",texte)
	texte=re.sub("\xce","I",texte)
	texte=re.sub("\xcf","I",texte)
	texte=re.sub("\xd1","N",texte)
	texte=re.sub("\xd2","O",texte)
	texte=re.sub("\xd3","O",texte)
	texte=re.sub("\xd4","O",texte)
	texte=re.sub("\xd5","O",texte)
	texte=re.sub("\xd6","O",texte)
	texte=re.sub("\xd9","U",texte)
	texte=re.sub("\xda","U",texte)
	texte=re.sub("\xdb","U",texte)
	texte=re.sub("\xdf","U",texte)
	texte=re.sub("\xdd","Y",texte)
	texte=re.sub("\xe0","a",texte)
	texte=re.sub("\xe1","a",texte)
	texte=re.sub("\xe2","a",texte)
	texte=re.sub("\xe3","a",texte)
	texte=re.sub("\xe4","a",texte)
	texte=re.sub("\xe5","a",texte)
	texte=re.sub("\xe7","c",texte)
	texte=re.sub("\xe8","e",texte)
	texte=re.sub("\xe9","e",texte)
	texte=re.sub("\xea","e",texte)
	texte=re.sub("\xeb","e",texte)
	texte=re.sub("\xec","ì",texte)
	texte=re.sub("\xed","í",texte)
	texte=re.sub("\xee","i",texte)
	texte=re.sub("\xef","i",texte)
	texte=re.sub("\xf1","n",texte)
	texte=re.sub("\xf2","o",texte)
	texte=re.sub("\xf3","o",texte)
	texte=re.sub("\xf4","o",texte)
	texte=re.sub("\xf5","o",texte)
	texte=re.sub("\xf6","o",texte)
	texte=re.sub("\xf9","u",texte)
	texte=re.sub("\xfa","u",texte)
	texte=re.sub("\xfb","u",texte)
	texte=re.sub("\xfc","u",texte)
	texte=re.sub("\xfd","y",texte)
	texte=re.sub("\xff","y",texte)

	return texte

###################################################################################
def decode_utf8(texte):
###################################################################################
	texte=re.sub("\xc3\x80","A",texte)
	texte=re.sub("\xc3\x81","A",texte)
	texte=re.sub("\xc3\x82","A",texte)
	texte=re.sub("\xc3\x83","A",texte)
	texte=re.sub("\xc3\x84","A",texte)
	texte=re.sub("\xc3\x85","A",texte)
	texte=re.sub("\xc3\x86","A",texte)
	texte=re.sub("\xc3\x87","C",texte)
	texte=re.sub("\xc3\x88","E",texte)
	texte=re.sub("\xc3\x89","E",texte)
	texte=re.sub("\xc3\x8a","E",texte)
	texte=re.sub("\xc3\x8b","E",texte)
	texte=re.sub("\xc3\x8c","I",texte)
	texte=re.sub("\xc3\x8d","I",texte)
	texte=re.sub("\xc3\x8e","I",texte)
	texte=re.sub("\xc3\x8f","I",texte)
	texte=re.sub("\xc3\x91","N",texte)
	texte=re.sub("\xc3\x92","O",texte)
	texte=re.sub("\xc3\x93","O",texte)
	texte=re.sub("\xc3\x94","O",texte)
	texte=re.sub("\xc3\x95","O",texte)
	texte=re.sub("\xc3\x96","O",texte)
	texte=re.sub("\xc3\x99","U",texte)
	texte=re.sub("\xc3\x9a","U",texte)
	texte=re.sub("\xc3\x9b","U",texte)
	texte=re.sub("\xc3\x9c","U",texte)
	texte=re.sub("\xc3\x9d","Y",texte)
	texte=re.sub("\xc3\xa0","a",texte)
	texte=re.sub("\xc3\xa1","a",texte)
	texte=re.sub("\xc3\xa2","a",texte)
	texte=re.sub("\xc3\xa3","a",texte)
	texte=re.sub("\xc3\xa4","a",texte)
	texte=re.sub("\xc3\xa5","a",texte)
	texte=re.sub("\xc3\xa7","c",texte)
	texte=re.sub("\xc3\xa8","e",texte)
	texte=re.sub("\xc3\xa9","e",texte)
	texte=re.sub("\xc3\xaa","e",texte)
	texte=re.sub("\xc3\xab","e",texte)
	texte=re.sub("\xc3\xac","ì",texte)
	texte=re.sub("\xc3\xad","í",texte)
	texte=re.sub("\xc3\xae","i",texte)
	texte=re.sub("\xc3\xaf","i",texte)
	texte=re.sub("\xc3\xb0","]",texte)
	texte=re.sub("\xc3\xb1","n",texte)
	texte=re.sub("\xc3\xb2","o",texte)
	texte=re.sub("\xc3\xb3","o",texte)
	texte=re.sub("\xc3\xb4","o",texte)
	texte=re.sub("\xc3\xb5","o",texte)
	texte=re.sub("\xc3\xb6","o",texte)
	texte=re.sub("\xc3\xb9","u",texte)
	texte=re.sub("\xc3\xba","u",texte)
	texte=re.sub("\xc3\xbb","u",texte)
	texte=re.sub("\xc3\xbc","u",texte)
	texte=re.sub("\xc3\xbd","y",texte)
	texte=re.sub("\xc3\xbf","y",texte)

	texte=re.sub("A\xc2\xa8","e",texte)
	texte=re.sub("A\xc2\xa9","e",texte)
	texte=re.sub("A\xc2\xaa","e",texte)
	texte=re.sub("A\xc2\xab","e",texte)
#	texte=re.sub("\xc2\x80","euro",texte)

	return texte

###################################################################################
def Clean_codage(texte):
###################################################################################
        texte=re.sub("\n  ","",texte)
        texte=re.sub("\n ","",texte)

	texte=re.sub("=\?utf-8\?q\?","",texte)
	texte=re.sub("=\?utf-8\?Q\?","",texte)
	texte=re.sub("=\?UTF-8\?Q\?","",texte)
	texte=re.sub("=\?iso-8859-1\?q\?","",texte)
	texte=re.sub("=\?iso-8859-1\?Q\?","",texte)
	texte=re.sub("=\?ISO-8859-1\?Q\?","",texte)
	texte=re.sub("=\?ISO-8859-15\?Q\?","",texte) 
	texte=re.sub("=\?Windows-1252\?Q\?","",texte)
	texte=re.sub("=\?windows-1252\?Q\?","",texte)
	texte=re.sub("=\?windows-1256\?Q\?","",texte)
	texte=re.sub("=\?Windows-1256\?Q\?","",texte)
	texte=re.sub("=\?windows-1258\?Q\?","",texte)
	texte=re.sub("=\?Windows-1258\?Q\?","",texte)

	texte=re.sub("=C3=80","A",texte)
	texte=re.sub("=C3=81","A",texte)
	texte=re.sub("=C3=82","A",texte)
	texte=re.sub("=C3=83","A",texte)
	texte=re.sub("=C3=84","A",texte)
	texte=re.sub("=C3=85","A",texte)
	texte=re.sub("=C3=86","Ae",texte)
	texte=re.sub("=C3=87","C",texte)
	texte=re.sub("=C3=88","E",texte)
	texte=re.sub("=C3=89","E",texte)
	texte=re.sub("=C3=8a","E",texte)
	texte=re.sub("=C3=8b","E",texte)
	texte=re.sub("=C3=8C","I",texte)
	texte=re.sub("=C3=8D","I",texte)
	texte=re.sub("=C3=8E","I",texte)
	texte=re.sub("=C3=8F","I",texte)
	texte=re.sub("=C3=91","N",texte)
	texte=re.sub("=C3=92","O",texte)
	texte=re.sub("=C3=93","O",texte)
	texte=re.sub("=C3=94","O",texte)
	texte=re.sub("=C3=95","O",texte)
	texte=re.sub("=C3=96","O",texte)
	texte=re.sub("=C3=99","U",texte)
	texte=re.sub("=C3=9A","U",texte)
	texte=re.sub("=C3=9B","U",texte)
	texte=re.sub("=C3=9C","U",texte)
	texte=re.sub("=C3=9D","Y",texte)
	texte=re.sub("=C3=A0","a",texte)
	texte=re.sub("=c3=a0","a",texte)
	texte=re.sub("=C3=A1","a",texte)
	texte=re.sub("=C3=A2","a",texte)
	texte=re.sub("=C3=A3","a",texte)
	texte=re.sub("=C3=A4","a",texte)
	texte=re.sub("=C3=A5","a",texte)
	texte=re.sub("=C3=A7","c",texte)
	texte=re.sub("=C3=A8","e",texte)
	texte=re.sub("=C3=A9","e",texte)
	texte=re.sub("=c3=a9","e",texte)
	texte=re.sub("=C3=AA","e",texte)
	texte=re.sub("=C3=AB","e",texte)
	texte=re.sub("=C3=AC","ì",texte)
	texte=re.sub("=C3=AD","i",texte)
	texte=re.sub("=C3=AE","i",texte)
	texte=re.sub("=C3=AF","i",texte)
	texte=re.sub("=C3=B1","n",texte)
	texte=re.sub("=C3=B2","o",texte)
	texte=re.sub("=C3=B3","o",texte)
	texte=re.sub("=C3=B4","o",texte)
	texte=re.sub("=C3=B5","o",texte)
	texte=re.sub("=C3=B6","o",texte)
	texte=re.sub("=C3=B9","u",texte)
	texte=re.sub("=C3=BA","u",texte)
	texte=re.sub("=C3=BB","u",texte)
	texte=re.sub("=C3=BC","u",texte)
	texte=re.sub("=C3=BD","y",texte)
	texte=re.sub("=C3=BF","ÿ",texte)

	texte=re.sub("=0A"," ",texte)
	texte=re.sub("=20"," ",texte)
	texte=re.sub("=21"," ",texte)
	texte=re.sub("=22"," ",texte)
	texte=re.sub("=26"," ",texte)
	texte=re.sub("=27"," ",texte)
	texte=re.sub("=28","[",texte)
	texte=re.sub("=29","]",texte)
	texte=re.sub("=2D",".",texte)
	texte=re.sub("=2E"," ",texte)
	texte=re.sub("=3A"," ",texte)
	texte=re.sub("=3B"," ",texte)
	texte=re.sub("=3D"," ",texte)
	texte=re.sub("=3E"," ",texte)
	texte=re.sub("=3F"," ",texte)
	texte=re.sub("=5D"," ",texte)
	texte=re.sub("=5F"," ",texte)
	texte=re.sub("=92"," ",texte) 
	texte=re.sub("=B0","]",texte) 
	texte=re.sub("=AB","[",texte) 
	texte=re.sub("=BB","]",texte) 
	texte=re.sub("=A0"," ",texte) 
	texte=re.sub("=AC"," ",texte) 
	texte=re.sub("=C0","A",texte)
	texte=re.sub("=C1","A",texte)
	texte=re.sub("=C2","A",texte)
	texte=re.sub("=C3","A",texte)
	texte=re.sub("=C4","A",texte)
	texte=re.sub("=C5","A",texte)
	texte=re.sub("=C7","C",texte)
	texte=re.sub("=C8","E",texte)
	texte=re.sub("=C9","E",texte)
	texte=re.sub("=CA","E",texte)
	texte=re.sub("=CB","E",texte)
	texte=re.sub("=CC","I",texte)
	texte=re.sub("=CD","I",texte)
	texte=re.sub("=CE","I",texte)
	texte=re.sub("=CF","I",texte)
	texte=re.sub("=D1","N",texte)
	texte=re.sub("=D2","O",texte)
	texte=re.sub("=D3","O",texte)
	texte=re.sub("=D4","O",texte)
	texte=re.sub("=D5","O",texte)
	texte=re.sub("=D6","O",texte)
	texte=re.sub("=D9","U",texte)
	texte=re.sub("=DA","U",texte)
	texte=re.sub("=DB","U",texte)
	texte=re.sub("=DC","U",texte)
	texte=re.sub("=DD","Y",texte)
	texte=re.sub("=E0","a",texte)
	texte=re.sub("=E1","a",texte)
	texte=re.sub("=E2","a",texte)
	texte=re.sub("=E3","a",texte)
	texte=re.sub("=E4","a",texte)
	texte=re.sub("=E5","a",texte)
	texte=re.sub("=E7","c",texte)
	texte=re.sub("=E8","e",texte)
	texte=re.sub("=E9","e",texte)
	texte=re.sub("=EA","e",texte)
	texte=re.sub("=EB","e",texte)
	texte=re.sub("=EC","i",texte)
	texte=re.sub("=ED","i",texte)
	texte=re.sub("=EE","i",texte)
	texte=re.sub("=EF","i",texte)
	texte=re.sub("=F1","n",texte)
	texte=re.sub("=F2","o",texte)
	texte=re.sub("=F3","o",texte)
	texte=re.sub("=F4","o",texte)
	texte=re.sub("=F5","o",texte)
	texte=re.sub("=F6","o",texte)
	texte=re.sub("=F9","u",texte)
	texte=re.sub("=FA","u",texte)
	texte=re.sub("=FB","u",texte)
	texte=re.sub("=FC","u",texte)
	texte=re.sub("=FD","y",texte)
	texte=re.sub("=FF","ÿ",texte)
	texte=re.sub("=3F","?",texte)
        texte=re.sub("=3A","_",texte)
	texte=re.sub("=2C",",",texte)
	texte=re.sub("=2F","",texte)
	texte=re.sub("\?=","",texte)

	if texte.find("?Q?") != -1 :
	        if re.search("@",texte) is not None and ( re.search("\"",texte) is not None or re.search("'",texte) is not None ):
			texte=re.sub("\"","",texte)
			texte=re.sub("'","",texte)
	if element == "subject" :
		texte=re.sub("/","_",texte)
		texte=re.sub(":","_",texte)
		texte=re.sub("&eacute","é",texte)
		texte=re.sub(" utf-8 Q","",texte)
           
	texte=re.sub("\xc2\x80","euro",texte)
	texte=re.sub("\xc2\x92","'",texte)
	texte=re.sub("\xc2\x96","_",texte)
	texte=re.sub("\xc2\x9c","oe",texte)

	return texte        

###################################################################################
# Main
###################################################################################

element="subject"
subject="=?iso-8859-1?B?UmU6IHBsYXRhbmVzO1/nYV9jb250aW51ZQ==?="
print "\n========================================"
print "subject : %s" %(subject)
debut=""
fin=""
r=0
while re.search("=\?",subject) is not None:
	r=r+1
	sujet0=""
	atraiter=""
	todo=0
	sujet=""
	for l in subject:
		if todo == 0 and l[0] == "=":
			todo=1
		if todo == 1:
			atraiter=atraiter+l[0]
			continue
		if todo == 1 and l[0] == " ":
			sujet0=" "
			todo=0
			continue
		if todo == 0:
			sujet0=sujet0+l[0]
	print "atraiter : %s" %(atraiter)
	if atraiter != "" :
		remplacer=decode(atraiter)
		remplacer=Clean_codage(remplacer)
		print "remplacer : %s" %(remplacer)
		remplacer=re.sub("\n","",remplacer)
		remplacer=re.sub("\r","",remplacer)
		modif=""
		for l in atraiter:
			if l[0] == "?" :
				modif=modif+"\\"+l[0]
			else:
				modif=modif+l[0]
		print "modif : %s" %(modif)
		subject=re.sub(modif,remplacer,subject)
		print "subject : OK >>> %s\n" %(subject)
	if r == 2:
		break
print "\n========================================"
subject="Valeur =?UTF-8?B?ZXN0aW3DqWUgZGVzIFBMQVRBTkVTIERFIEJFR09VWCAgOiAx?=\n =?UTF-8?B?OTAgMDAwIGV1cm9zICEhIQ==?="
print "subject : %s" %(subject)
debut=""
fin=""
r=0
while re.search("=\?",subject) is not None:
	r=r+1
	sujet0=""
	atraiter=""
	todo=0
	sujet=""
	for l in subject:
		if todo == 0 and l[0] == "=":
			todo=1
		if todo == 1:
			atraiter=atraiter+l[0]
			continue
		if todo == 1 and l[0] == " ":
			sujet0=" "
			todo=0
			continue
		if todo == 0:
			sujet0=sujet0+l[0]
	print "atraiter : %s" %(atraiter)
	if atraiter != "" :
		remplacer=decode(atraiter)
		remplacer=Clean_codage(remplacer)
		print "remplacer : %s" %(remplacer)
		remplacer=re.sub("\n","",remplacer)
		remplacer=re.sub("\r","",remplacer)
		modif=""
		for l in atraiter:
			if l[0] == "?" :
				modif=modif+"\\"+l[0]
			else:
				modif=modif+l[0]
		print "modif : %s" %(modif)
		subject=re.sub(modif,remplacer,subject)
		print "subject : OK >>> %s\n" %(subject)
	if r == 2:
		break
print "\n========================================"
subject="=?utf-8?B?UmU6IFJlOl9fcGxhdGFuZXM7X8OnYV9jb250aW51ZQ==?="
print "subject : %s" %(subject)
debut=""
fin=""
r=0
while re.search("=\?",subject) is not None:
	r=r+1
	sujet0=""
	atraiter=""
	todo=0
	sujet=""
	for l in subject:
		if todo == 0 and l[0] == "=":
			todo=1
		if todo == 1:
			atraiter=atraiter+l[0]
			continue
		if todo == 1 and l[0] == " ":
			sujet0=" "
			todo=0
			continue
		if todo == 0:
			sujet0=sujet0+l[0]
	print "atraiter : %s" %(atraiter)
	if atraiter != "" :
		remplacer=decode(atraiter)
		remplacer=Clean_codage(remplacer)
		print "remplacer : %s" %(remplacer)
		remplacer=re.sub("\n","",remplacer)
		remplacer=re.sub("\r","",remplacer)
		modif=""
		for l in atraiter:
			if l[0] == "?" :
				modif=modif+"\\"+l[0]
			else:
				modif=modif+l[0]
		print "modif : %s" %(modif)
		subject=re.sub(modif,remplacer,subject)
		print "subject : OK >>> %s\n" %(subject)
	if r == 2:
		break
print "\n========================================"
subject="=?iso-8859-1?B?UmU6IFJFOl9SZTpfcGxhdGFuZXM7X+dhX2NvbnRpbnVl?="
print "subject : %s" %(subject)
debut=""
fin=""
r=0
while re.search("=\?",subject) is not None:
	r=r+1
	sujet0=""
	atraiter=""
	todo=0
	sujet=""
	for l in subject:
		if todo == 0 and l[0] == "=":
			todo=1
		if todo == 1:
			atraiter=atraiter+l[0]
			continue
		if todo == 1 and l[0] == " ":
			sujet0=" "
			todo=0
			continue
		if todo == 0:
			sujet0=sujet0+l[0]
	print "atraiter : %s" %(atraiter)
	if atraiter != "" :
		remplacer=decode(atraiter)
		remplacer=Clean_codage(remplacer)
		print "remplacer : %s" %(remplacer)
		remplacer=re.sub("\n","",remplacer)
		remplacer=re.sub("\r","",remplacer)
		modif=""
		for l in atraiter:
			if l[0] == "?" :
				modif=modif+"\\"+l[0]
			else:
				modif=modif+l[0]
		print "modif : %s" %(modif)
		subject=re.sub(modif,remplacer,subject)
		print "subject : KO >>> %s\n" %(subject)
	if r == 2:
		break

Merci
A+

Dernière modification par JujuLand (Le 15/11/2016, à 18:43)

seebz · Le 14/11/2016, à 18:05

Je n'ai pas testé mais peut-être que tu aurais plus facile en utilisant un module plus adapté :
https://docs.python.org/2/library/quopri.html

JujuLand · Le 14/11/2016, à 19:38

Je ne pense pas:

This is used to decode “Q”-encoded headers

Or je n'ai aucun problème pour ce type d'encodage, mais pour du "B"-encoded ... objet de ce post.

Et il est sous-entendu dans cette page que le type "B" doit être traité par le module base64 que j'utilise ...

Merci
A+

Dernière modification par JujuLand (Le 14/11/2016, à 19:48)

pingouinux · Le 14/11/2016, à 22:20

Bonsoir,

Il y a 4 chaines différentes, les trois premières sont parfaitement décodées, la 4eme non ..., pourtant Thunderbird la décode parfaitement.

C'est bien compliqué, ton histoire.
Le problème vient de ce que la 4ème chaîne contient un signe +, qu'il faut aussi faire précéder de \.
J'ai un peu simplifié ton script, en utilisant une fonction decodage pour les séquences répétitives. Tu remplaces les lignes 356 à 528 (la dernière) de ton script par

def decodage(subject):
   print "\n========================================"
   print "subject : %s" %(subject)
   debut=""
   fin=""
   r=0
   while re.search("=\?",subject) is not None:
        r=r+1
        sujet0=""
        atraiter=""
        todo=0
        sujet=""
        for l in subject:
                if todo == 0 and l == "=":
                        todo=1
                if todo == 1:
                        atraiter=atraiter+l
                        continue
                if todo == 1 and l == " ":
                        sujet0=" "
                        todo=0
                        continue
                if todo == 0:
                        sujet0=sujet0+l
        print "atraiter : %s" %(atraiter)
        if atraiter != "" :
                remplacer=decode(atraiter)
                remplacer=Clean_codage(remplacer)
                print "remplacer : %s" %(remplacer)
                remplacer=re.sub("\n","",remplacer)
                remplacer=re.sub("\r","",remplacer)
                modif=re.sub("\?","\\?",atraiter)
                modif=re.sub("\+","\\+",modif)
                print "modif : %s" %(modif)
                subject=re.sub(modif,remplacer,subject)
                print "subject : OK >>> %s\n" %(subject)
        if r == 2:
                break

###################################################################################
# Main
###################################################################################

element="subject"
decodage("=?iso-8859-1?B?UmU6IHBsYXRhbmVzO1/nYV9jb250aW51ZQ==?=")
decodage("Valeur =?UTF-8?B?ZXN0aW3DqWUgZGVzIFBMQVRBTkVTIERFIEJFR09VWCAgOiAx?=\n =?UTF-8?B?OTAgMDAwIGV1cm9zICEhIQ==?=")
decodage("=?utf-8?B?UmU6IFJlOl9fcGxhdGFuZXM7X8OnYV9jb250aW51ZQ==?=")
decodage("=?iso-8859-1?B?UmU6IFJFOl9SZTpfcGxhdGFuZXM7X+dhX2NvbnRpbnVl?=")

Ajouté :
Tu peux encore simplifier en utilisant la méthode replace du type string str, au lieu de la fonction sub du module re.

def decodage(subject):
   print "\n========================================"
   print "subject : %s" %(subject)
   debut=""
   fin=""
   r=0
   while re.search("=\?",subject) is not None:
        r=r+1
        sujet0=""
        atraiter=""
        todo=0
        sujet=""
        for l in subject:
                if todo == 0 and l == "=":
                        todo=1
                if todo == 1:
                        atraiter=atraiter+l
                        continue
                if todo == 1 and l == " ":
                        sujet0=" "
                        todo=0
                        continue
                if todo == 0:
                        sujet0=sujet0+l
        print "atraiter : %s" %(atraiter)
        if atraiter != "" :
                remplacer=decode(atraiter)
                remplacer=Clean_codage(remplacer)
                print "remplacer : %s" %(remplacer)
                remplacer=remplacer.replace("\n","")
                remplacer=remplacer.replace("\r","")
                subject=subject.replace(atraiter,remplacer)
                print "subject : OK >>> %s\n" %(subject)
        if r == 2:
                break

###################################################################################
# Main
###################################################################################

element="subject"
decodage("=?iso-8859-1?B?UmU6IHBsYXRhbmVzO1/nYV9jb250aW51ZQ==?=")
decodage("Valeur =?UTF-8?B?ZXN0aW3DqWUgZGVzIFBMQVRBTkVTIERFIEJFR09VWCAgOiAx?=\n =?UTF-8?B?OTAgMDAwIGV1cm9zICEhIQ==?=")
decodage("=?utf-8?B?UmU6IFJlOl9fcGxhdGFuZXM7X8OnYV9jb250aW51ZQ==?=")
decodage("=?iso-8859-1?B?UmU6IFJFOl9SZTpfcGxhdGFuZXM7X+dhX2NvbnRpbnVl?=")

Correction : string -> str

Dernière modification par pingouinux (Le 15/11/2016, à 10:37)

JujuLand · Le 15/11/2016, à 10:33

Ok, pour le + à échapper que je n'avais pas remarqué.

Pour mettre le décodage dans une fonction, je procède évidemment de cette façon, je ne sais pas trop pourquoi, je l'avais dupliqué ...

Pour le subject.replace(chaine,chaine), je ne programme en général pas en python, et je ne connaissais pas.
J'utilisais en fait des fonctions que j'avais trouvées auparavant dans mbox-extract-attachments.

D'un point de vue purement syntaxique, il me semble qu'il n'y a pas de grosse différence entre les deux solutions.
Je conçois que comme j'utilise déjà la librairie string, il n'est pas necessaire d'utiliser re.

Mais existe-t-il l'équivalent à re.search(chaine,subject), genre subject.search(chaine) ?

Merci
A+

Dernière modification par JujuLand (Le 15/11/2016, à 10:34)

pingouinux · Le 15/11/2016, à 10:48

Si tu n'utilises pas d'Expression Rationnelle, re n'est généralement pas utile.

Je conçois que comme j'utilise déjà la librairie string

Je me suis trompé en #4, ce n'est pas le type string dont je parlais, mais str (j'ai corrigé).

Mais existe-t-il l'équivalent à re.search(chaine,subject), genre subject.search(chaine) ?

Oui

subject.find(chaine)

Voici un extrait de la doc python :

Help on method_descriptor in str:
str.find = find(...)
S.find(sub [,start [,end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end]. Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.
(END)

JujuLand · Le 15/11/2016, à 18:43

ok, merci

Je continue mon business

A+

Ubuntu-fr

Navigation

Liens de recherche

Annonce

#1 Le 14/11/2016, à 15:35

[Résolu] Décodage des chaines encodées (mail)

#2 Le 14/11/2016, à 18:05

Re : [Résolu] Décodage des chaines encodées (mail)

#3 Le 14/11/2016, à 19:38

Re : [Résolu] Décodage des chaines encodées (mail)

#4 Le 14/11/2016, à 22:20

Re : [Résolu] Décodage des chaines encodées (mail)

#5 Le 15/11/2016, à 10:33

Re : [Résolu] Décodage des chaines encodées (mail)

#6 Le 15/11/2016, à 10:48

Re : [Résolu] Décodage des chaines encodées (mail)

#7 Le 15/11/2016, à 18:43

Re : [Résolu] Décodage des chaines encodées (mail)

Pied de page des forums