1031 字
3 分钟
觅知网 AIPPT 的 .mz 数据链与离线转 PPTX(逆向记录)

背景#

觅知网 对外提供 AIPPT 等能力:用户在浏览器里在线改稿时,页面背后维护的是一套 画布语义模型,落盘或接口里往往不是直接的 .pptx,而是自有的 .mz 载荷。
若要在本地用 PowerPoint 继续编辑,需要先把 在线侧拿到的 .mz 还原成 结构化 JSON,再映射到 Office 对象模型。

本文是个人 逆向与学习记录:只讨论 传输外壳、JSON 形态与转换实现,不涉及破解会员、绕过付费或批量爬取版权素材;素材与成稿的版权以平台协议为准。

分析过程#

1)在线修改时数据从哪来#

典型路径有两类(具体以你当时抓包为准):

  • 保存 / 同步接口:编辑器在「保存」「导出」或自动同步时,向服务端提交或拉取一段 文本型载荷(常见为 base64 或已是 JSON 片段),浏览器侧再拼进业务状态。
  • 本地落盘:若你已把响应体或剪贴板内容存成文件,只要扩展名或内容特征能识别为 .mz,后续处理与「从 URL 下载再转」相同。

核心观察:.mz 不是 PPTX 的 ZIP 包,而是 「编码 + 压缩 + JSON」 的叠层;在线修改的本质是 反复读写这份 JSON 所描述的多页画板(boards

2)把 .mz 还原成 JSON#

文件常见两种入口:

  1. 整段 base64:解码得到 zlib / gzip 压缩的字节流,解压后为 UTF-8 JSON。实现上可对 zlib.decompresswbits 尝试多种常见取值(如 47 / 31 / 15),直到 json.loads 成功。
  2. 已是 JSON:若文件去掉空白后以 { 开头,可直接 json.loads

另有经验规则:若原始字节去掉空白后以 eJ 开头,多为 zlib 压缩后再 base64 的典型前缀(base64 对 zlib 魔数的编码),可按「先按 base64 解压管线走」处理。

还原后建议在本地 json.dump(..., indent=2, ensure_ascii=False) 存一份 .json,便于用 IDE 或 jq 对照字段,再写转换逻辑。

3)从 JSON 到 PPTX#

根对象里 分页 一般由 boards 驱动:每个 board 对应一页,含 bgColorbgImageelements 等。元素类型常见包括 img / font / shape / line / group;画布逻辑坐标多按 1280×720,再按比例映射到幻灯片宽高(例如 16:9 的英寸尺寸)。

实现要点简述:

  • group:递归叠加父级 position,展平后再画到幻灯片上。
  • 文字:段落树(type: paragraph)+ text run,映射到 python-pptx 的段落与 run(字号、粗体、颜色、themeColors 命名色等)。
  • 图片:JSON 里常为 相对路径,需与 CDN 根 URL 拼接;图床可能校验 Referer,仅改 User-Agent 往往不够。
  • 旁路 1.txt:每行一个完整图片 URL,按「路径、去域名与 query」建索引,在拼接 URL 失败时 优先按列表顺序重试(例如带 imageView2 的七牛处理 URL)。

4)依赖与运行方式#

pip install python-pptx requests
python mz_to_pptx.py your.mz -o out.pptx
# 或:python mz_to_pptx.py --url 'https://.../xxx.mz'
# 图床前缀可多次 --image-base;同目录 1.txt 会自动作为图片 URL 列表

完整脚本(mz → pptx)#

下面为当前自用的 mz_to_pptx.py 全文:入口负责读本地或 --url 下载;load_mz 完成 .mzdict(JSON 根对象)convert 遍历 boards 写入 .pptx。若你实际站点的 CDN 或 Referer 与默认值不一致,请改 DEFAULT_IMAGE_BASESargparse 中的 --referer 默认值。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
将在线编辑器 .mz(base64 + zlib/gzip 包装的 JSON)转为可编辑的 .pptx。
图片在 JSON 中为相对路径(如 588552/ppt/media/image3.png),默认用 CDN 前缀拼接;
也可提供每行一个完整 URL 的列表文件(如 1.txt)优先尝试下载;失败则插入灰色占位框。
"""
from __future__ import annotations
import argparse
import base64
import io
import json
import re
import sys
import zlib
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Tuple
from urllib.parse import urljoin, urlparse
import requests
from pptx import Presentation
from pptx.dml.color import RGBColor
from pptx.enum.shapes import MSO_CONNECTOR, MSO_SHAPE
from pptx.enum.text import MSO_ANCHOR, MSO_AUTO_SIZE, PP_ALIGN
from pptx.util import Inches, Pt
DEFAULT_IMAGE_BASES = [
"https://ppt-qn.molishe.com/",
"https://imgs-qn.molishe.com/",
]
CANVAS_W = 1280.0
CANVAS_H = 720.0
def decompress_mz_bytes(raw: bytes) -> dict:
if raw[:1] == b"{":
return json.loads(raw.decode("utf-8"))
b64 = raw.decode("ascii").strip()
data = base64.b64decode(b64)
for wbits in (47, 31, 15):
try:
return json.loads(zlib.decompress(data, wbits).decode("utf-8"))
except Exception:
continue
raise ValueError("无法解压 .mz 内容(尝试 zlib/gzip 包装)")
def load_mz(path: Path) -> dict:
raw = path.read_bytes()
if raw.lstrip().startswith(b"eJ"):
return decompress_mz_bytes(raw)
try:
return json.loads(raw.decode("utf-8"))
except Exception:
return decompress_mz_bytes(raw)
def theme_map(root: dict) -> Dict[str, str]:
tc = root.get("themeColors") or {}
m: Dict[str, str] = {}
for k, v in tc.items():
if isinstance(v, str) and re.fullmatch(r"[0-9A-Fa-f]{6}", v):
m[k] = v.upper()
return m
def resolve_color(
value: str,
theme: Dict[str, str],
) -> Optional[RGBColor]:
if not value or value == "transparent":
return None
if re.fullmatch(r"[0-9A-Fa-f]{6}", value):
h = value
return RGBColor(int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16))
if value in theme:
h = theme[value]
return RGBColor(int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16))
return None
def iter_paragraphs(texts: Any) -> Iterable[dict]:
if not isinstance(texts, list):
return
for p in texts:
if isinstance(p, dict) and p.get("type") == "paragraph":
yield p
def extract_runs(paragraph: dict) -> List[dict]:
runs: List[dict] = []
for ch in paragraph.get("children") or []:
if isinstance(ch, dict) and ch.get("type") == "text":
runs.append(ch)
elif isinstance(ch, dict) and "text" in ch:
runs.append(ch)
return runs
def map_align(s: Optional[str]) -> int:
if s == "center":
return PP_ALIGN.CENTER
if s == "right":
return PP_ALIGN.RIGHT
if s == "justify":
return PP_ALIGN.JUSTIFY
return PP_ALIGN.LEFT
def map_valign(s: Optional[str]) -> int:
if s in ("mid", "middle", "center"):
return MSO_ANCHOR.MIDDLE
if s in ("down", "bottom"):
return MSO_ANCHOR.BOTTOM
return MSO_ANCHOR.TOP
def load_image_url_list_file(path: Path) -> Dict[str, List[str]]:
"""
读取「每行一个图片 URL」的文件,按路径(不含域名、不含 query)建索引,
例如 https://ppt-qn.molishe.com/588552/ppt/media/image3.png?... -> 键 588552/ppt/media/image3.png
同一键下保留文件中的顺序并去重。
"""
out: Dict[str, List[str]] = {}
text = path.read_text(encoding="utf-8", errors="replace")
for line in text.splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
parsed = urlparse(line)
if parsed.scheme not in ("http", "https") or not parsed.netloc:
continue
key = parsed.path.lstrip("/")
if not key:
continue
out.setdefault(key, []).append(line)
for k in list(out.keys()):
seen: set = set()
uniq: List[str] = []
for u in out[k]:
if u not in seen:
seen.add(u)
uniq.append(u)
out[k] = uniq
return out
def try_fetch_image(
src: str,
bases: List[str],
session: requests.Session,
referer: Optional[str],
url_list_by_path: Optional[Dict[str, List[str]]] = None,
) -> Optional[bytes]:
if not src:
return None
urls: List[str] = []
seen: set = set()
def add(u: str) -> None:
if u and u not in seen:
seen.add(u)
urls.append(u)
if src.startswith("http://") or src.startswith("https://"):
add(src)
else:
norm = src.split("?", 1)[0].lstrip("/")
if url_list_by_path and norm in url_list_by_path:
for u in url_list_by_path[norm]:
add(u)
for b in bases:
add(urljoin(b.rstrip("/") + "/", src.lstrip("/")))
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}
if referer:
headers["Referer"] = referer
for u in urls:
try:
r = session.get(u, headers=headers, timeout=20)
if r.status_code == 200 and r.content[:8] not in (b"{", b"<!DOC"):
ct = (r.headers.get("content-type") or "").lower()
if "json" in ct:
continue
return r.content
except Exception:
continue
return None
def flatten_elements(
elements: List[dict],
ox: float = 0.0,
oy: float = 0.0,
) -> List[Tuple[dict, float, float]]:
out: List[Tuple[dict, float, float]] = []
for el in elements or []:
if not el.get("show", True):
continue
pos = el.get("position") or {}
x = float(pos.get("x", 0)) + ox
y = float(pos.get("y", 0)) + oy
if el.get("type") == "group":
inner = el.get("elements") or []
out.extend(flatten_elements(inner, x, y))
else:
out.append((el, x, y))
return out
def emu_xywh(
slide_w: int,
slide_h: int,
x: float,
y: float,
w: float,
h: float,
) -> Tuple[int, int, int, int]:
left = int(x / CANVAS_W * slide_w)
top = int(y / CANVAS_H * slide_h)
width = max(1, int(w / CANVAS_W * slide_w))
height = max(1, int(h / CANVAS_H * slide_h))
return left, top, width, height
def apply_text_frame(
text_frame: Any,
texts: Any,
theme: Dict[str, str],
default_align: str = "left",
default_valign: str = "up",
) -> None:
text_frame.clear()
text_frame.auto_size = MSO_AUTO_SIZE.NONE
first = True
paras = list(iter_paragraphs(texts))
if not paras:
p0 = text_frame.paragraphs[0]
p0.text = ""
text_frame.vertical_anchor = map_valign(default_valign)
return
for para in paras:
p = text_frame.paragraphs[0] if first else text_frame.add_paragraph()
first = False
p.alignment = map_align(para.get("textAlign") or default_align)
for run_data in extract_runs(para):
run = p.add_run()
run.text = str(run_data.get("text", ""))
fs = run_data.get("fontSize")
if isinstance(fs, (int, float)) and fs > 0:
run.font.size = Pt(float(fs))
fam = str(run_data.get("fontFamily") or "")
if "Bold" in fam or str(run_data.get("fontWeight")).lower() == "bold":
run.font.bold = True
fc = run_data.get("fontColor")
rgb = resolve_color(str(fc), theme) if fc else None
if rgb is not None:
run.font.color.rgb = rgb
if fam:
name = fam.split("-")[0] if "-" in fam else fam
if name.startswith("SourceHan"):
run.font.name = "微软雅黑"
else:
run.font.name = name[:31]
text_frame.vertical_anchor = map_valign(default_valign)
def add_board(
slide: Any,
board: dict,
theme: Dict[str, str],
image_bases: List[str],
session: requests.Session,
referer: Optional[str],
url_list_by_path: Optional[Dict[str, List[str]]] = None,
) -> None:
prs = slide.part.package.presentation_part.presentation
slide_w, slide_h = prs.slide_width, prs.slide_height
bg = board.get("bgColor")
if bg and bg != "transparent":
rgb = resolve_color(bg, theme)
if rgb is not None:
try:
fill = slide.background.fill
fill.solid()
fill.fore_color.rgb = rgb
except Exception:
pass
bg_img = board.get("bgImage") or {}
src = bg_img.get("src")
if src:
data = try_fetch_image(
str(src),
image_bases,
session,
referer,
url_list_by_path,
)
if data:
bio = io.BytesIO(data)
slide.shapes.add_picture(bio, 0, 0, width=slide_w, height=slide_h)
items = flatten_elements(board.get("elements") or [])
for el, x, y in items:
et = el.get("type")
size = el.get("size") or {}
w = float(size.get("width", 1))
h = float(size.get("height", 1))
left, top, width, height = emu_xywh(slide_w, slide_h, x, y, w, h)
if et == "img":
raw = try_fetch_image(
str(el.get("src") or ""),
image_bases,
session,
referer,
url_list_by_path,
)
if raw:
pic = slide.shapes.add_picture(io.BytesIO(raw), left, top, width=width, height=height)
pic.rotation = float(el.get("rotate") or 0)
else:
shp = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, left, top, width, height)
shp.fill.solid()
shp.fill.fore_color.rgb = RGBColor(230, 230, 230)
shp.line.fill.background()
tf = shp.text_frame
tf.clear()
p = tf.paragraphs[0]
p.text = "[图片无法加载]"
p.alignment = PP_ALIGN.CENTER
tf.vertical_anchor = MSO_ANCHOR.MIDDLE
continue
if et == "line":
ls = el.get("lineStyle") or {}
lw = float(ls.get("lineWidth") or 1)
color = resolve_color(str(ls.get("lineColor") or "000000"), theme)
x2 = left + width
y2 = top + height
conn = slide.shapes.add_connector(MSO_CONNECTOR.STRAIGHT, left, top, x2, y2)
ln = conn.line
ln.width = Pt(max(0.25, lw * 72 / 96))
if color is not None:
ln.color.rgb = color
conn.rotation = float(el.get("rotate") or 0)
continue
if et == "shape":
dst = (el.get("defaultShape") or {}).get("type") or "rect"
if dst == "rect" or dst == "custom":
shp = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, left, top, width, height)
else:
shp = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, left, top, width, height)
fill_rgb = resolve_color(str(el.get("color") or "transparent"), theme)
if fill_rgb is not None:
shp.fill.solid()
shp.fill.fore_color.rgb = fill_rgb
else:
shp.fill.background()
shp.line.fill.background()
shp.rotation = float(el.get("rotate") or 0)
tc = el.get("textContent")
if tc and tc.get("show", True):
texts = tc.get("texts")
valign = tc.get("verticalAlign") or "up"
apply_text_frame(
shp.text_frame,
texts,
theme,
default_valign=str(valign),
)
continue
if et == "font":
box = slide.shapes.add_textbox(left, top, width, height)
box.rotation = float(el.get("rotate") or 0)
fs = el.get("fontStyle") or {}
align = fs.get("textAlign") or "left"
valign = el.get("verticalAlign") or "up"
apply_text_frame(
box.text_frame,
el.get("texts"),
theme,
default_align=str(align),
default_valign=str(valign),
)
continue
def convert(
mz_path: Path,
out_path: Path,
image_bases: List[str],
referer: Optional[str],
url_list_by_path: Optional[Dict[str, List[str]]] = None,
) -> None:
root = load_mz(mz_path)
boards = root.get("boards") or []
if not boards:
raise SystemExit("JSON 中无 boards,无法生成幻灯片")
theme = theme_map(root)
prs = Presentation()
prs.slide_width = int(Inches(13.333))
prs.slide_height = int(Inches(7.5))
blank = prs.slide_layouts[6]
session = requests.Session()
for board in boards:
slide = prs.slides.add_slide(blank)
add_board(
slide,
board,
theme,
image_bases,
session,
referer,
url_list_by_path,
)
out_path.parent.mkdir(parents=True, exist_ok=True)
prs.save(str(out_path))
print(f"已写入: {out_path}(共 {len(boards)} 页)")
def main() -> None:
ap = argparse.ArgumentParser(description=".mz 转 .pptx")
ap.add_argument(
"input",
type=Path,
nargs="?",
default=None,
help=".mz 文件路径(与 --url 二选一)",
)
ap.add_argument(
"--url",
default=None,
help="从该 URL 下载 .mz 再转换",
)
ap.add_argument("-o", "--output", type=Path, default=None, help="输出 .pptx 路径")
ap.add_argument(
"--image-base",
action="append",
default=[],
help="图片相对路径的前缀 URL,可多次指定",
)
ap.add_argument(
"--image-urls-file",
type=Path,
default=None,
help="每行一个完整图片 URL 的文本文件(如 1.txt);未指定时若与 .mz 同目录存在 1.txt 则自动加载",
)
ap.add_argument("--referer", default="https://www.molishe.com/", help="请求图片时的 Referer")
args = ap.parse_args()
if args.url:
sess = requests.Session()
r = sess.get(args.url, timeout=60)
r.raise_for_status()
tmp = Path.cwd() / "_mz_download_temp.mz"
tmp.write_bytes(r.content)
inp = tmp
stem = Path(args.url.split("?", 1)[0]).stem
out = args.output or (Path.cwd() / f"{stem}.pptx")
else:
if not args.input:
ap.error("请提供 input 路径或使用 --url")
inp = args.input.expanduser().resolve()
out = args.output or inp.with_suffix(".pptx")
bases = args.image_base if args.image_base else list(DEFAULT_IMAGE_BASES)
url_list_path: Optional[Path] = None
if args.image_urls_file is not None:
url_list_path = args.image_urls_file.expanduser().resolve()
else:
sidecar = inp.resolve().parent / "1.txt"
if sidecar.is_file():
url_list_path = sidecar
url_list_by_path: Optional[Dict[str, List[str]]] = None
if url_list_path is not None and url_list_path.is_file():
url_list_by_path = load_image_url_list_file(url_list_path)
print(f"已加载图片 URL 列表: {url_list_path}{len(url_list_by_path)} 个路径)")
convert(
inp,
out.expanduser().resolve(),
bases,
args.referer or None,
url_list_by_path,
)
if args.url:
try:
inp.unlink()
except OSError:
pass
if __name__ == "__main__":
main()

合规与小结#

  • AIPPT 产出与模板素材的著作权、许可范围以觅知网及权利人声明为准;本文仅作格式与工程向记录,不鼓励未授权传播或商用爬取。
  • 流水线可概括为:在线编辑 → 取得 .mz → 解压/解析为 JSON → 下载资源并按 boards 映射为 PPTX。若你也遇到类似「专有外壳 + JSON 语义树」的交付物,优先把 外壳规则、根对象分页字段、资源 URL 规则 固定下来,再写转换会省事很多。

分享

如果这篇文章对你有帮助,欢迎分享给更多人!

觅知网 AIPPT 的 .mz 数据链与离线转 PPTX(逆向记录)
https://www.51miz.com/
作者
孔大夫
发布于
2026-05-07
许可协议
CC BY-NC-SA 4.0

部分信息可能已经过时